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ABSTRACT 

One  aspect  of  efficient  management  of  resources  that 
cannot  be  overstated  is  accurate  cost  estimation.  The 
learning  curve  technique  used  in  cost  estimation  continues 
to  be  a  significant  tool  by  itself  and  as  an  important 
factor  in  other  cost  estimation  algorithms.  This  study 
conducts  an  empirical  investigation  of  a  theoretical 
reformulation  of  the  cumulative  average  learning  curve.  The 
model  is  empirically  corroborated  by  comparison  of  linear 
and  nonlinear  regression  results  with  the  classical  unit  and 
cumulative  average  learning  curve  specifications  using  two 
sets  of  aircraft  production  data.  When  autocorrelation  was 
present  and  subsequently  modeled  into  the  data,  the 
resulting  linear  models  were  significantly  distorted  whereas 
the  non-linear  models  were  not.  While  the  model  being 
scrutinized  was  adequate,  the  unit  learning  curve  appeared 
to  be  the  superior  model. 
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I.   INTRODUCTION 

A.   BACKGROUND 

In  March  of  1972,  the  General  Accounting  Office  sent  a 
preliminary  report  to  Congress  dealing  with  the  acquisition 
of  major  weapon  systems  [Ref.  l:p.  1] .  The  GAO  reported 
that  the  Navy  had  experienced  a  cost  growth  of  $19  billion 
on  twenty-four  weapon  systems  in  FY  1971,  of  which  15 
percent  was  attributed  to  poor  cost  estimation.  Inaccurate 
cost  estimates  for  weapon  systems  can  result  in  program 
delays,  cost  overruns,  acquisition  of  systems  that  are  not 
the  most  cost  effective,  and  a  lack  of  taxpayer  confidence 
in  military  leaders,  to  name  only  a  few  of  the  consequences. 
Congressional  concern  and  a  continuing  need  for  better 
planning  estimates  have  made  it  imperative  that  new 
techniques  be  developed  and  old  methods  be  improved  to 
obtain  better  cost  estimates  for  major  weapon  system 
production  and  acquisition  [Ref.  2:p.  1].  In  the  area  of 
cost  estimation,  an  old  technique  that  continues  to  be  a 
significant  tool  is  the  learning  curve. 

The  first  study  addressing  the  learning  curve  phenomenon 
was  documented  by  the  pioneer  of  the  learning  curve,  T.  P. 
Wright  of  the  Curt i ss-Wr ight  Corporation,  in  his  1936  paper, 
"Factors  Affecting  the  Cost  of  Airplanes"  [Ref.  3:p.  32]. 
Analysis   of   the   data   collected   for   a   number   of   years 


beginning  in  1922  concerned  the  relationship  of  production 
quantity  with  cost  as  measured  in  direct  labor  hours. 
Wright  claimed  that  each  time  the  cumulative  production 
quantity  doubled,  the  average  unit  cost  for  that  quantity 
decreased  by  a  constant  amount,  and  that  this  relationship 
plotted  as  a  straight  line  on  logarithmic  paper.  Wright's 
formulation  of  the  learning  curve  was: 


Y   =  aXb 
c 


where 

X:    cumulative  production  quantity 

Y  :   average  cost  per  unit 

b:    factor  of  cost  variation 

a:  direct  manhour  cost  for  production  unit  number  one 
Based  on  most  of  the  literature  available,  it  can  safely 
be  said  that  the  principal  factors  contributing  to  the 
existence  of  this  learning  phenomenon  include  considerably 
more  than  just  operator  learning.  Conway  and  Schultz 
[Ref.  4:p.  42]  believe  that  learning  in  aircraft  production 
is  influenced  by  a  number  of  dur ing-prod uct ion  factors 
includ  ing : 

1)  incentive  pay 

2)  changes  in  tooling 

3)  design  changes 
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4)  management  learning 

5)  volume  changes 

6)  quality  improvements 

The  rate  of  a  learning  curve  is  usually  described  by  the 
complement  of  the  reduction  achieved  when  the  production 
quantity  is  doubled.  This  value  is  usually  called  the  slope 
of  the  curve  and  is  found: 


S  =  Y2X/YX 


=  (2X)b/Xb 


=  2b 


where 

b:   slope  of  learning  curve 

S:   fraction  to  which  the  cost  decreases  when  production 
quantity  doubles 

Wright  believed   that   the  cumulative  average  learning 

phenomenon  plotted  linearly  on  logarithmic  scales  and  the 

unit  learning  curve  formulation  derived  from  this  cumulative 

equation  would  be  [Ref.  5:p.  266]: 


Y   =  axb 
c 


Ym  =  Y   '  X 
T     c 


=  axb+1 


11 


So,      Yx  =  a(Xb+1  -  (X  -  l)b+1) 


=  a(b  +  l)Xb      as  X  -  - 


where 

Y  :   average  cost  per  unit 

Y  :   total  cumulative  cost 
Y„:   cost  of  the  Xth  unit 

a,b:  parameters  of  the  formulation 

J.  R.  Crawford,  another  major  contributor  to  the 
literature  and  theory  of  learning  curves,  disagreed  with 
T.  P.  Wright  in  the  log-linear  formulation  of  the  cumulative 
average  learning  curve  [Ref.  6:p.  21].  His  disagreement  was 
based  on  the  apparently  steep  slope  between  early  production 
units  of  the  unit  learning  curve  derived  from  the  cumulative 
curve.  In  Crawford's  studies,  he  described  the  learning 
phenomenon  in  what  has  been  termed  the  unit  learning  curve: 


Yx  -  axb 


where 

Yx:  cost  of  the  Xth  unit 

X:  cumulative  amount  of  units  produced 

a:  manhour  cost  for  the  first  production  unit 

b:  factor  of  cost  variation 


12 


The  cumulative  average  cost  curve  derived  from  the  unit 
curve  is  [Ref.  6:p.  21]: 


Yx  =  axb 


VT  -  a  V  Xb 
X  =  l 


Yc  =  (a  Y,     xb)/X 
x=l 


=  (a/  (1  +  b)  )Xb    as   X  ■*  » 
where 

Y„:   cost  of  the  Xth  unit 

Y  :   total  cumulative  cost 

Y  :   average  cost  per  unit  produced 
a,b:  formulation  parameters 

For  years  both  the  unit  learning  curve  and  the 
cumulative  average  learning  curve  have  been  used  almost 
interchangeably.  Womer  and  Patterson  [Ref.  5:p.  266]  show 
and  conclude  this  is  so  because  for  large  values  of  X,  each 
curve  is  a  good  approximation  for  the  other.  They  go  on  to 
say  that  a  problem  arises,  however,  since  learning  curves 
are  generally  formulated  on  the  first  few  units  of  output  to 


13 


forecast  the  cost  of  an  entire  production.  Even  though 
forecasts  may  be  for  large  values  of  X,  the  data  used  to 
make  them  are  not.  Under  these  circumstances,  the  estimated 
cumulative  average  learning  curve,  for  example,  may  approach 
a  unit  learning  curve,  but  not  necessarily  the  same  unit 
curve  that  would  be  approximated  from  early  units.  Which 
log-linear  learning  curve  specification  to  choose,  unit  or 
cumulative,  had,  through  the  years,  presented  a  source  for 
inaccurate  cost  estimation.  Although  93  percent  of  all 
firms  utilize  Crawford's  unit  learning  curve  [Ref.  7:p.  23], 
there  are  sufficient  exceptions  to  the  use  of  this  unit 
curve  implying  experience  seems  to  be  the  best  method  for 
choosing  a  particular  model. 

Following  World  War  II,  Gardner  Carr  of  the  McDonnell 
Aircraft  Corporation  felt  learning  curves  being  represented 
as  linear  on  logarithmic  paper  was  an  inaccurate  portrayal 
of  the  learning  phenomenom.  In  his  April  1946  article 
[Ref.  8:p.  77],  Carr  felt  that  the  straight  line  was 
adequate  for  overall  project  statistics  but  is  rarely 
correct  for  budget  or  actual  cost  finding  purposes.  He 
believed  that  the  cumulative  average  learning  curve  was 
S-shaped  on  the  logarithimic  scale.  Explanations  for  the 
various  segment  shapes  of  this  curve  are  found  in  a  RAND 
report  by  Asher  ,  "Cost  Quantity  Relationships  in  the 
Airframe  Industry"   [Ref.  6:p.  28]. 
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Another  study  which  suggested  that  learning  curves  do 
not  adhere  to  log-linearity  was  conducted  by  the  Stanford 
Research  Institute  following  World  War  II.  The  Stanford 
system  utilizes  the  'B-factor'  which,  basically,  modifies 
the  standard  learning  curve  for  prior  experience.  The 
formulation  of  this  learning  curve  is: 


=  a/J" 


X  +  B 


where 

Y:   cost  per  unit  in  manhours 

a:   theoretical  first  unit  cost 

X:   cumulative  quantity  produced 

B:   modification  factor 
The  effect  of  this  formulation  is  a  concave  curve  on  the 
logarithmic  scale.   The  cost  of  the  first  unit  is  depressed 
and  the  curve  arcs  to  the  standard  learning  curve  [Ref.  7: 
p.  8]  . 

Further  research  that  deviated  from  the  log-linearity 
hypothesis  was  conducted.  Another  perspective  of  the 
production  process  is  that  various  departments  contribute  to 
the  overall  quantity  of  direct  labor  hours.  Generally 
speaking,  these  departments  are  fabrication,  subassembly, 
major,  and  final  assembly.  It  seems  obvious  that  each 
department  contributing  to  the  learning  curve  would  itself 
have  its  own  learning  curve.    In  order  for  the  various 
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departments  to  have  their  learning  effects  sum  to  an  overall 

production  process  log-linear  learning  curve,  each  of  the 

department  slopes  must  be  identical.    In  practice,   the 

various  departments  often  have  different  slopes.    Summing 

these  curves  would  result  in  a  departure  from  log-linearity 

and  arrive  at  a  convex  curve  whose  slope  is  bounded  by  the 

flattest  of  the  component  curves.    In  "Cost  Quantity 

Relationships   in  the  Airframe   Industry"   [Ref.   6:p.   69], 

Asher   uses   this   argument  while  conducting   a  significant 

analysis   disputing   the   log-linear   hypothesis   of   the 

formulation  of  the  learning  curve.   In  his  report,  he  also 

cites   research  done  previously  by   P.   B.   Crouse,   G.   M. 

Giannini,  and  P.  Guibert  supporting  his  contentions.   Asher 

concludes,  however,  that  his  study 

.  does  not  discredit  the  use  of  the  linear  progress 
curve  ....  The  linear  curve  is  useful  for  making 
extrapolations  beyond  the  data  range  provided  the  number 
of  additional  units  is  small.  It  is  clearly  a  matter  of 
judgement  whether  or  not  in  a  specific  instance  the  linear 
curve  is  appropriate  ....  If  allowable  error  is 
relatively  small,  a  convex  curve  resulting  from  predicting 
each  of  the  component  curves  separately  is  probably  more 
appropr i  ate . 

Another  approach  to  research  in  the  theory  of  learning 

curves  has  involved  the  inclusion  of  production  rate  as  an 

explanatory  variable  in  learning  curve  models.   In  Alchian's 

1963  article  [Ref.  9:p.  679],  he  cites  work  done  in  1948 

that  concluded  production  rate  is  not  a  relevant  variable. 

Whereas  as  results  published  by  Smith  [Ref.  10:p.  138],  and 
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supported  by  Kinton  and  Congelton  [Ref.  ll:p.  92],  concluded 
that  production  rate  plays  a  significant  role  in  explaining 
the  effects  of  learning,  other  studies  with  contradictory 
results  exist.  Womer  and  Gulledge  have  produced  a  consider- 
able literature  discussing  the  effects  of  production  rate 
which  resulted  in  a  final  report  for  the  Air  Force  [Ref.  12: 
p.  5]  addressing  the  contradictory  results  of  previous 
research,  and  they  develop  a  cost  function  including 
production  rate  and  the  cost-quantity  relationship  of 
learning  curve  theory. 

In  his  article  "The  Learning  Curve:  Historical  Review 
and  Comprehensive  Study"  [Ref.  13:p.  302],  Yelle  states  that 
most  of  the  literature  in  learning  curve  theory,  from  its 
inception  through  the  1960's,  has  focused  on  primarily 
military  applications  in  the  early  years  through  World  War 
II  and  on  industry  and  business  in  the  more  recent  years. 
Through  the  years  and  various  paths  that  research  in  this 
area  has  followed,  most  of  the  studies  do  not  reach 
consistent  conclusions.  The  early  goals  of  developing  a 
general  formulation  of  the  learning  curve  that  could  be 
applied  to  the  entire  aircraft  industry  or  subsets  of  it 
were  quickly  abandoned.  Despite  the  vast  amounts  of 
literature  disputing  the  log-linear  relationship  between 
cost  and  cumulative  quantity  produced,  the  unit  learning 
curve  is  still  the  most  widely  used  formulation  of  the 
learning  curve  used  in  cost  estimation  today  [Ref.  7:p.  7]. 
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B.   OBJECTIVES 

The  preceding  pages  and  references  provide  a  brief 
summary  of  the  research  expended  on  the  theory  of  the 
learning  curve  over  the  past  half  century.  The  important 
point  is  the  learning  phenomenon  and  the  numerous  formu- 
lations of  this  theory  in  aircraft  and  other  industries  has 
been  an  area  of  extensive  research  and  continues  to  be  a 
viable  tool  in  the  world  of  production  economics. 

The  purpose  of  this  research  is  to  conduct  an  empirical 
study  of  still  another  theoretical  reformulation  of  the 
learning  curve.  In  "Budgets,  Contracts,  Incentives  and 
Costs:  A  Stylized  Nexus",  by  Boger,  Jones  and  Sontheimer 
[Ref.  14:p.  23],  the  cumulative  average  learning  curve  is 
reformulated  to  examine  the  influence  cost  forecasting  and 
budget  formation  have  on  the  incentives  bearing  on  the  firm 
for  cost  control.  The  model  developed  by  Boger  et .  al  .  ,  a 
cumulative  average  learning  curve  model,  and  a  unit  learning 
curve  model  will  be  estimated  through  simple  linear  and  non- 
linear regression  techniques  using  several  sets  of  aircraft 
production  data.  For  each  formulation  of  the  learning 
curve,  the  models  resulting  from  the  two  fitting  techniques 
will  be  analyzed,  validated,  and  compared.  Finally,  the 
Boger  et .  al .  model  will  be  compared  with  the  classical 
learning  curve  models  for  empirical  validation. 
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II .   THE  MODELS 

A.   CUMULATIVE  AVERAGE  LEARNING  CURVE 

The  cumulative  average  learning  curve,  as  discussed 
above,  was  first  formulated  by  T.  P.  Wright  in  the  1930's. 
The  log-linear  relationship  between  cumulative  production 
quantity  and  average  cost  per  unit  is: 


Yc  =  axb 


where 

X:    cumulative  production  quantity 

Y  :   average  cost  per  unit 

b:    factor  of  cost  variation 

a:    direct  manhour  cost  for  first  unit 

The  cumulative  production  quantity  is  usually  expressed 
as  an  integer  number  of  units  produced.  The  cost  variable 
is  measured  in  direct  manhours  expended  in  the  production  of 
the  cumulative  quantity  produced.  We  expect  the  learning 
curve  slope,  factor  of  cost  variation,  to  have  a  negative 
value  when  we  anticipate  the  presence  of  learning  in  the 
production  of  some  product.  This  formulation  also 
presupposes  a  relatively  constant  rate  of  production  and 
uniformity  of  units  produced.    Deviations  from  these  last 
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assumptions  are  recognizable  in  a  plot  of  the  raw  data, 
i.e.,  toe  up,  toe  down,  bottom  out,  scallop. 

B.   UNIT  LEARNING  CURVE 

The  unit  learning  curve,  as  also  discussed  above,  was 
first  formulated  by  J.  R.  Crawford.  He  disagreed  with 
Wright's  log-linear  formulation  of  the  cumulative  average 
learning  curve.  Crawford  believed  the  relationship  between 
cumulative  quantity  produced  and  the  cost  of  the  final  unit 
of  that  quantity  was  log-linear  and  was  formulated  as: 


Yx  =  axb 


where 

Y„:   cost  of  the  final  unit 

X:    cumulative  quantity  produced 

a:    direct  manhour  cost  for  first  unit 

b:    factor  of  cost  variation 
The  same  comments  and  assumptions  concerning  the  cumulative 
average  learning  curve  apply. 

C.   BOGER,  JONES,  AND  SONTHEIMER  MODEL 

Boger  ,  Jones,  and  Sontheimer  express  the  costs  of 
production  over  a  time  period  as  opposed  to  over  the 
production  of  cumulative  units  regardless  of  time.  They  use 
the  cumulative  average  learning  curve  as  the  starting  point 
in  their  formulation. 
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As  discussed  above,  the  typical  cumulative  average 
learning  curve  is  of  the  form: 


Y(t)  =  aQ(t)b  (1) 


where  now 

Y(t):   average  cost  per  unit 

Q(t):   cumulative  quantity  of  units  produced  through 
time  t 

a,b:    learning  curve  parameters 

The  typical  progress  function  (learning  curve)   treats  the 

inputs  as  varying  continuously  and  causing   a  related 

continuous  variation   in   some  product   (output)   [Ref.   14: 

p.  23].    From  (1)  we  can  derive  an  expression  for  total 

cost : 


Q(t)  ■  Y(t)  =  aQ(t)b  Q(t) 


X(t)  =  aQ(t)b+1  (2) 


where 

X(t):   total  quantity  of  inputs  consumed  by  the  production 
of  Q(t) 

This  specification  yields  the  following  marginal  require- 
ments, dX,  for  an  incremented  output,  dQ: 


f  -   a(b  +  l,Qb  (3 
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Now,  assume  the  product  emerges  in  quantities  at  discrete 
time  intervals.  That  is,  we  now  develop  an  algorithm  using 
the  cumulative  average  learning  curve  formulation  based  on 
how  many  units  are  produced  in  a  specified  time  period.  In 
application,  we  assume  that  progress  or  cost  per  quantity  is 
proportional  to  productivity  achieved  in  prior  production: 


*   Xt-1 
Xt  =  <5 .  q. 

t     t    ^t-1   fc 


(4) 


where 

q,  =  dQ:   amount  produced  in  time  period  t 

X   =  dX:   inputs  used  in  time  period  t 

6  :        proportionality  constant 
We   assume   that   learning   is   derived   not   only   from   the 
preceding  period  but  from  all  the  production  prior  to  the 
period  we  are  in.   So  we  first  set: 


£  =  §  -  a«b  ♦  X,0b 


where 

Q  =  Q(t) 
Substituting  (4)  we  get: 


X 


t-1 


q^/q.  =  a(b  +  1)Q 
qt-l 


«t  7^  qt  =  a(b  +  l)Qb  q, 
qt-l 


(5) 
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We  now  let  Q,  the  quantity  of  units  produced  up  to  time  t, 
be  equal  to  the  quantity  of  units  produced  through  time 
period  t-1.   Now,  substituting  into  (5): 


t-1 


't-1 


t  q 


t-1 


qt  =  a(b  +  1 


'  E  ^b 


(6) 


j-l 


Equation  (6)  assumes  learning  in  period  t  is  derived  only 
from  production  in  period  t-1.  We  assume  this  relationship 
must  hold  at  previous  time  periods  also.  So  rewriting  (4) 
and  (5)  for  period  t-1, 


Xt-1  "  St-1  qg  «t-l  =  a(b  +  l)    Q*bqt-1 


where 

Q* :   amount  of  units  produced  through  time  period  t-2 
Therefore , 

t-2 


:t-l  =  a<b  +  X)  [  S  3j]b  ^t-1 


j  =  l 


which  leads  to: 


t-2 


t-1 
ft-1 


=  a 


(b  +  1)  [  X) 


j  =  l 
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and    substituting    into    (6): 


t-2  t-1 

«ta(b   +   1)[    J2qj]h   qt   =    a(b   +    1)[    23    qj]b   qt 
j-l  j-l 


t-1 

E  «, 

j-l 
t-2 

U       j-l 


for    t    =    3,    4 ,     5,     ...,    T 


Now  substituting  (7)  into  (4)  we  have: 


_  t-1 


E  *• 

j-l 


t-2 

E^ 

j=l 


X 


t-1 


qt-l    * 


ft 

qt 


t-1 

E  * 

j-l 
t-2 

E  * 

j  =  l 


't-l 

[t-l 


Since  this  is  true  for  all  time  periods,  we  can  say 
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r    t-2 

b 

xt-l 

E 

j  =  l 

qj 

Xt-2 

*t-l 

t-3 

E 

j=l 

qj    j 

qt-2 

_    t-3 

b 

Xt-2 
qt-2 

E 

j  =  l 
t-4 

E 

qj 
qi 

Xt-3 
qt-3 

and    so    on. 

j=l 


So,  substituting  recursively  we  have 


t-1 


t-2 


t-3 


i  E*i'b 


j=l 
t-2 


tZ^h  i  z 


J-l 

t-3 


t-4 


<  £<Vb   i  !>/   t  E^i 


j-i 


j-i 


j-i 


E*/ 


j-l 
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q2 


t-1 

E  «■ 

3=1 


Q- 


_2 

q2 


_  t-1 


=  Z 


E  i- 

j  =  l 
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where 


z : 


t-1 


j-l 


direct  manhours  per  quantity  produced  in  second 
time  period 


total  quantity  of  units  produced  prior  to  present 
time  period 


q,  :      quantity  of  units  produced  on  time  period  one 

b:       factor  of  cost  variation 

X./q..:   average  cost  in  direct  manhours  of  units  produced 
in  time  period  t 

The  length  of  the  time  period,  although  it  must  remain  fixed 

over  the  data  space,  can  be  any  length,  i.e.,  day,  month,  or 

quarter.   The  quantity  produced  in  a  particular  time  period 

need  not  be  an   integer   amount  although  partial  units 

produced   are  generally  not   found   in   aircraft   production 

data.   As  in  the  cumulative  average  and  unit  learning  curve 

formulations,  we  expect  the  factor  of  cost  variation  to  have 

a  negative  value.    This  model  also  presupposes  uniformity 

between  production  units   and   also   a  constant   production 

rate  . 
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Ill .   DATA 

A.   GENERAL 

The  dependent  variable  in  each  of  the  models  investi- 
gated will  involve  a  cost  of  some  type.  In  each  of  our 
models  this  cost  will  be  measured  as  a  function  of  direct 
manhours  expended  in  the  production  of  some  quantity  of 
units.  Direct  manhours  will  be  defined  as  those  hours  spent 
on  fabrication,  assembly,  production  flight,  and  other 
production  work  associated  with  the  basic  aircraft.  All 
manhours  pertaining  to  tooling,  engineering,  planning, 
testing  and  subcontracting  are  not  included  in  this 
definition.  It  seems  obvious  that  the  way  in  which  direct 
manhours  are  accumulated  can,  and  does,  lead  to  inconsis- 
tencies due  to  differences  in  accounting  systems  from 
contractor  to  contractor.  The  use  of  direct  manhours  has 
numerous  advantages  over  the  use  of  dollars  as  a  measure  of 
cost.  In  using  direct  manhours,  we  avoid  the  additional 
data  computations  involved  in  applying  price  indices  to 
transform  all  dollar  costs  into  constant  dollars.  We  also 
avoid  inaccuracies  in  the  data  caused  by  using  price  indices 
which  are  inexact  figures.  Finally,  direct  manhours  is  a 
variable  comparable  over  a  group  of  contractors  whereas,  due 
to  differences  in  wage  rates  from  contractor  to  contractor, 
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costs     measured      in     dollars     are     not      the     best      tool      for 
comparison . 

The  data  for  this  report  include  aircraft  production 
data  for  the  C-141  and  F-102.  The  C-141  was  produced  by  the 
Lockheed  Corporation  and  the  F-102  was  produced  by  General 
Dynamics.  The  C-141  program  produced  284  aircraft  from  July 
1962  through  April  1968.  The  C-141  is  a  large,  swept  wing, 
4  jet  engine  cargo  transport.  The  data  for  this  study  were 
drawn  from  Orsini  [Ref.  15:p.  104].  Orsini  obtained  the 
data  from  C-141  Financial  Management  Reports  prepared  by  the 
contractor,  Lockheed  Aircraft  Corporation,  for  the  Air 
Force.  The  C-141  data  provided  a  large  sample  of  data  for 
which  a  basic  model  of  the  aircraft  was  produced  throughout 
the  production  program.  Uniformity  between  units  produced 
is  a  basic  assumption  in  the  application  of  the  learning 
curve  theory.  Orsini  aggregated  the  monthly  production  data 
into  quarterly  direct  manhour  production  data  reducing  the 
total  number  of  data  points  to  twenty-four.  Orsini  felt 
this  quantity  was  sufficient  for  his  analysis  and  the 
current      research      is      similarly      restricted.  The     data 

variables    used    by    Orsini    and    this    researcher    are: 

1)  direct    labor    hours    per    lot    per    month 

2)  aircraft    per    lot 

3)  delivery   dates    of    each    aircraft 

The  F-102  program  produced  1000  aircraft  from  1953 
through    1958.       The    F-102    is    a    single    seat,    supersonic,    delta 
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wing,      all-weather      fighter.  The     data     for     this     study     was 

drawn      from      Gulledge      and      Womer       [Ref.       12:p.       73].  A 

comprehensive  cost  breakdown  by  individual  airframe  was 
provided  by  the  F-102  Program  Cost  History"  document--the 
source  of  the  Womer  and  Gulledge  data.  The  F-102  program 
consisted  of  the  production  of  F-102  airframes  and  TF-102 
airframes.  Rather  than  delete  the  TF-102  observations  for 
the  sake  of  strict  uniformity,  these  data  points  were  not 
eliminated  since  it  was  assumed  that  learning  was 
experienced  in  the  production  of  these  airframes.  As  Womer 
and  Gulledge  note,  the  total  manhours  expended  per  airframe 
can  be  disaggregated  into  three  parts:  details,  assemblies, 
and      out s id e-o f-f actor y      labor.  Total     direct     cost     per 

airframe  is  comprised  of  only  detail  and  assembly  hours. 
The  detail  hours  are  comprised  of  fabrication  hours  and 
assembly  hours  include  subassembly,  major  assembly,  primary 
assembly,  and  final  assembly  hours.  After  the  portion  of 
labor  hours  expended  per  airframe  outside  the  factory  is 
deleted,    the    total    direct    cost    per    airframe    is    left. 

B.   REFINEMENT 

As  already  discussed,  three  models  will  be  utilized  in 
the  examination  of  two  sets  of  aircraft  production  data. 
Parameter  estimation  for  these  models  require  the  data  to  be 
in  a  particular  form  for  each  model.   The  C-141  production 

kr 

data  is  available  for  aircraft  grouped  into  production  lots 
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and  the  F-102  production  data  is  available  for  each 
airframe.  Since  the  models  do  not  each  fit  the  particular 
form  of  each  data  set,  adjustments  and  refinements  need  to 
be  made  to  the  data  to  fit  the  different  learning  curve 
formulations . 

1.   Cumulative  Average  Learning  Curve 

The  data  requirements  for  the  cumulative  average 
learning  curve  are  rather  straightforward.  The  independent 
variable  is  the  cumulative  quantity  of  aircraft  produced. 
The  dependent  variable  is  the  average  amount  of  direct  labor 
hours  expended  per  unit  in  the  production  of  the  cumulative 
quantity  produced.  The  F-102  and  C-141  adjusted  data  used 
to  fit  the  cumulative  average  learning  curve  are  tabulated 
in  Appendix  A. 

The  composition  of  the  F-102  data  consist  basically 
of  total  hours  expended  in  the  production  of  each  airframe. 
This  data  set  lends  itself  to  be  easily  refined  to  meet  the 
data  requirements  of  the  cumulative  average  learning.  As 
previously  discussed,  the  F-102  total  direct  manhours  per 
aircraft  consisted  of  three  parts:  details,  assemblies,  and 
outside  of  factory  labor.  Table  I,  extracted  from  Womer  and 
Gulledge  [Ref.  12:p.  86],  provided  the  information  necessary 
to  translate  the  raw  data  into  direct  manhours  per  airframe. 
Since  this  table  only  applied  to  lots  four  through  eleven, 
only  these  204  observations  were  utilized.  The  „air frames  in 
lots  four  through  eleven  were  then  ordered  with  respect  to 
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TABLE  I 

PERCENT  OF  TOTAL  MANHOURS  ALLOCATED  TO 
SPECIFIC  ACTIVITIES  BY  CONTRACT 


Contract 

5942      23903      29264  31174  33965 

Fabrication    19.45      21.98      21.23  16.12  18.47 

Assembly       65.82      70.56      64.82  66.27  61.62 


Outside  of 

Factory        14.73       7.46      13.95      17.61      19.91 
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delivery  sequence  number.  It  was  this  sequence--l,  2,  3, 
...,  204--that  provided  the  independent  variable  data 
vector.  The  sequence  of  cumulative  sums  of  direct  manhours 
divided  by  the  cumulative  amount  of  airframes  delivered  for 
each  element  of  that  sequence  provided  the  dependent 
variable   data    vector. 

The  C-141  data  were  organized  into  twelve  lots.  The 
number  of  units  in  each  lot  and  the  number  of  direct  man- 
hours  expended  in  the  production  of  each  lot  of  airframes  is 
provided.  The     data     required     for     the     cumulative     average 

learning  curve  is  arrived  at  through  a  series  of  simple 
calculations  discussed  in  the  RAND  Memorandum  "An  Intro- 
duction to  Equipment  Cost  Estimating"  [Ref.  16:p.  104].  The 
cumulative  average  hours  are  computed  at  the  final  unit  in 
each  lot--where  the  cumulative  average  hour  figures  apply. 
Therefore,  twelve  data  points  will  be  used  in  the  parameter 
estimation  for  the  C-141  cumulative  average  learning  curve 
formulation . 

2.       Unit    Learning   Curve 

The  data  requirements  for  the  unit  learning  curve 
are  also  rather  straightforward.  The  independent  variable 
is      the      cumulative      quantity      of      aircraft      produced.  The 

dependent  variable  is  the  amount  of  direct  manhours  expended 
in  the  production  of  the  final  unit  of  the  cumulative 
quantity  produced.  The  F-102  and  C-141  adjusted  data  used 
to    fit    the    unit    learning    curve    are    tabulated    in    Appendix    B. 
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The  composition  of  the  F-102  data  again  tends  to  be 
easily  refined  to  meet  the  data  requirements  of  the  unit 
learning  curve.  Table  I  is  used  to  translate  the  raw  data 
of  lots  four  through  eleven  into  direct  manhours  per 
airframe.  The  airframes  were  then  ordered  with  respect  to 
delivery  sequence  number.  It  was  this  sequence  of  204 
airframes  with  each  unit's  respective  direct  labor  hours 
required  for  production  that  are  used  as  the  independent  and 
dependent  variable  data  vectors  for  the  estimation  of  the 
parameters  of  the  unit  learning  curve. 

Since  the  C-141  production  data  are  grouped  into 
lots,  a  rather  gross  approximating  technique  is  required  to 
transform  the  data  into  the  form  required  by  the  unit 
learning  curve  specification.  The  average  number  of  labor 
hours  for  each  lot  is  treated  as  if  it  were  an  observation 
on  the  labor  hours  required  to  produce  the  unit  at  the  lot 
midpoint.  When  dealing  with  a  log-linear  relationship,  the 
arithmetic  midpoint  produces  unequal  areas  under  the 
learning  curve  between  the  first  and  last  units  of  each 
respective  lot.  The  exact  determination  of  a  true  lot 
midpoint  depends  on  the  lot  quantity,  type  of  curve  hypothe- 
sized, and  the  true  slope  of  the  learning  curve  [Ref.  16: 
p.  105].  In  order  to  avoid  the  shortcomings  of  the 
arithmetic  midpoint,  the  algebraic  midpoint,  K,  discussed  in 
[Ref.  17:p.  44]  will  be  used: 
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m(l  +  B) 

(L  +  .5)  (1  +  B)  -  (F  -  .5)  (1  +  B) 


m:   lot  quantity 

B:   learning  curve  slope 

L:   last  unit  of  the  lot 

F:   first  unit  of  the  lot 

An  estimate  of  B  from  Womer  and  Patterson's  report 

[Ref.   5:p.   267],   is   used   in   calculating   the   algebraic 

midpoint.     Again,   twelve   data   points   are   used   in   the 

parameter   estimation   for   the  C-141   unit   learning   curve 

speci  f ications  . 

3.   Boger,  Jones,  and  Sontheimer  Model 

The  data  requirements  for  this  model  are  based  on 
the  statement  regarding  the  marginal  requirements  for 
incremental  outputs  of  product  produced  in  Boger,  Jones,  and 
Sontheimer' s  paper  [Ref.  14:p.  23].  That  is,  the  product 
emerges  in  lots  or  lumps,  q,  ,  at  discrete  intervals  using 
discrete  inputs,  X. ,  of  the  composite  resource  (direct  labor 
hours).  Therefore,  the  data  requirements  for  this  model 
are:  quantity  of  units  produced  each  time  period  and  the 
direct  labor  hours  expended  in  the  production  of  units 
produced  in  each  time  period. 

The  complete  data  base  for  the  F-102  program 
contains  total  labor  hours  for  each  airframe.  This  data  is 
not  in  the  form  required  for  the  Boger  et .  al .  model.  Womer 
and  Gulledge  took  considerable  care  in  resolving  the  data 
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problem  in  their  study  [Ref.  12:p.  85],  Their  work  made  the 
data  compatible  with  the  theoretical  model  they  were 
testing.  The  information  concerning  the  F-102  program  that 
Womer  and  Gulledge  discuss  made  it  possible  to  apply  some 
further  adjustments  to  establish  a  data  base  compatible  with 
the  Boger  et.  al.  model. 

As  discussed  before,  the  ideal  data  for  the  Boger 
et .  al .  model  is  the  total  number  of  aircraft  produced  in  a 
specific  time  period,  q,  ,  and  the  quantity  of  direct  labor 
hours,  X.  ,  expended  in  producing  q,  .  Although  this  data  is 
not  directly  available,  Womer  and  Gulledge  derived  the  next 
best  alternative — cost  by  lot  per  month.  Due  to  non- 
availability of  certain  information,  Womer  and  Gulledge  only 
were  able  to  approximate  the  cost  by  lot  per  month  for  lots 
four  through  eleven. 

Tables  I,  II,  and  III  along  with  the  F-102  data  base 
in  [Ref.  L2:pp.  83-85]  provided  enough  information  to  adjust 
the  data  for  lots  four  through  eleven  for  use  in  the  Boger 
et .  al .  model.  The  first  adjustment  was  to  use  Table  I  and 
the  total  labor  hours  expended  on  each  airframe  in  lots  four 
through  eleven  to  arrive  at  values  for  cumulative  fabrica- 
tion and  assembly  hours  for  each  airframe.  As  discussed 
earlier,  these  hours  comprise  the  direct  labor  hours 
expended  for  each  airframe.  The  next  step  was  to  calculate 
the  equivalent  airframe  units  produced  per  month  for  each 
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lot.   This  was  calculated  by  first  determining  the  empirical 
production  rates  for  each  lot: 


DMHf 


aircraft 
in  lot 
f     airframes  in  lot 


in   lot  £       i         a  r-     r  i  i 

Y.p   =  — : — ; : — r    for  lots  4,  5,  6,  ...,  11 


DMH 
a 


airframes 
in  lot 
a     airframes  in  lot 


it              inlot  C1  Acs-  ii 

Y^       =   — = — ^ = t— r         for    lots    4,     5,    6,     ...,     11 


Production    rate    (fab)     =    1/Yf 
Production    rate    (assem)    =    1/Y0 

a 

DMHf:   direct  manhours  for  fabrication 

DMH  :   direct  manhours  for  assembly 

a  2 

The  production  rates  for  fabrication  and  assembly  were  then 
applied  in  conjunction  with  Tables  II  and  III  to  the 
cumulative  fabrication  and  assembly  hours  per  month  per  lot, 
then  added  to  arrive  at  equivalent  aircraft  produced  per 
month  per  lot.  These  results  were  then  summed  across  lots 
four  through  eleven  for  each  month  appropriately  using 
Tables  II  and  III  to  arrive  at  equivalent  units  produced  per 
month.  Direct  labor  hours  expended  per  month  on  the 
equivalent  quantity  of  airframes  produced  per  month  was 
similarly  calculated.  The  adjusted  F-102  production  data 
per  month  for  lots  four  through  eleven  for  use  in  the  Boger 
et .  al .  model  is  summarized  in  Appendix  C. 
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The  original  form  of  the  C-141  data  made  available 
to  Orsini  by  the  Air  Force  Plant  Representative  Office  was 
direct  manhours  per  lot  per  month  expended  as  direct  labor 
hours  as  defined  previously  and  the  quantity  of  aircraft  per 
lot.  Orsini  then  aggregated  this  monthly  data  into 
quarterly  data  points  and  tabulated  it  as  direct  manhours 
per  lot  per  quarter.  The  adjustments  made  to  the  data  by 
Orsini  for  his  analysis  were  compatible  with  the  refinements 
required  by  the  Boger  et .  al.  model.  Average  production 
rate  for  each  lot  was  first  determined  by  dividing  total 
aircraft  in  each  lot  by  the  total  amount  of  direct  labor 
hours  attributed  to  the  production  of  each  respective  lot. 
This  average  production  rate  was  then  applied  to  the 
tabulated  quarterly  data  to  arrive  at  equivalent  units 
produced  per  lot  per  quarter.  The  equivalent  units  produced 
per  lot  per  quarter  and  direct  labor  labor  hours  per  quarter 
were  then  summed  across  each  lot  for  the  quarters  each  lot 
was  worked  on  to  arrive  at  equivalent  units  produced  per 
quarter  and  direct  labor  hours  expended  per  quarter.  The 
data,  as  refined  by  Orsini,  used  in  the  Boger  et .  al .  model 
is  tabulated  in  Appendix  C. 
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IV.   METHODOLOGY 

A.   LINEAR  REGRESSION 

Historically,  it  has  usually  been  assumed  that  the 
relationship  between  the  independent  and  dependent  variables 
of  a  learning  curve  specification  is  log-linear.  This 
assumption  has  made  it  particularly  easy  to  estimate  the 
learning  curve  parameters  through  simple  linear  regression 
when  only  one  independent  variable  is  used.  In  this  study, 
the  least  squares,  normal  error  regression  model  is 
utilized.   The  normal  error  model  is: 

Yi  =  B0  +  8lXi  +  ci     for  i  =  1'  2'  3'  ••* 


where 

Y  ■  :      observed  response  of  the  i    trial 

X-:      the  level  of  the  independent  variable  in  i    trial 

Bq,8,:   regression  parameters 

2 

e-:      residuals  which  are  distributed  M(0,  a    ) 

Normality  of  the  error  terms  seems  reasonable  since  the 
residuals  probably  represent  the  accumulation  of  many 
effects  that  are  omitted  from  the  model.  The  cumulative 
error  term,  e.,  would  tend  to  comply  with  the  central  limit 
theorem  and  approach  normality.   Since  the  error  terms  are 


40 


assumed  to  be  normally  distributed,  the  assumption  of  no 
correlation  between  residuals  becomes  one  of  independence. 
Still  yet,  the  assumption  of  normality  allows  one  to  perform 
some  parametric  statistical  tests  in  evaluating  the 
statistical  significance  of  the  estimated  parameters  and  the 
aptness  of  the  model. 

B.   NON-LINEAR  REGRESSION 

Non-linear  regression  software  in  STATGRAPHICS  [Ref.  18: 
pp.  19-35]  is  used  as  an  alternative  method  of  parameter 
estimation.  In  this  procedure,  least  squares  estimates  of 
the  parameters  of  a  non-linear  model  are  determined.  The 
learning  curve  formulations  in  this  study  are  inherently 
non-linear  when  the  data  are  in  their  raw  form.  The  non- 
linear model  is: 


Yi  =  aX^   +  ei      for  i  =  1,  2,  3,  ... 

where 

Y.:    observed  response  of  the  i    trial 

X.:    level  of  the  independent  variable  of  i    trial 

a,b:   regression  parameters 

2 

e.:    residuals  which  are  distributed  N(0,  a  ) 

The  non-linear  regression  method  utilized  in  the 
STATGRAPHICS  software  was  developed  by  D.  W.  Marquardt  and 
represents  a  compromise  between  the  linearization  (Taylor 
series)  method  and  the  steepest  descent  method  of  non-linear 
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parameter  estimation.  Marquardt's  compromise  has  been 
described  as  combining  the  best  features  of  the  lineariza- 
tion and  steepest  descent  methods  while  avoiding  their  most 
serious  limitations.  A  detailed  discussion  and  references 
for  this  algorithm  are  contained  in  Draper  and  Smith's 
Applied  Regression  Analysis,  Second  Edition  [Ref.  19: 
p.  471].  An  important  aspect  of  non-linear  regression  that 
deviates  from  the  linear  case  is  worth  mentioning.  When  the 
error  term  of  the  non-linear  model  is  assumed  to  be  normally 
distributed,  the  parameter  estimates  are  no  longer  normally 
distributed  and  the  sample  residual  variance  is  no  longer  an 
unbiased  estimate  of  the  residual  variance.  While  suitable 
comparison  of  mean  squares  can  be  made  visually,  the  usual 
F-tests  for  regression  and  lack  of  fit  are  not  valid,  in 
general,  for  the  non-linear  case  [Ref.  19:p.  484]. 

C.   DATA  ANALYSIS 

Examination  of  the  observed  residuals  of  a  regression 
model  is  an  important  aspect  of  any  regression  technique. 
If  the  model  is  appropriate,  the  observed  residuals  should 
reflect  the  properties  assumed  for  the  error  term  in  the 
regression  model.  In  this  study,  both  graphical  and 
statistical  tests  involving  the  residuals  will  be  performed. 
Evaluation  of  the  residuals  of  the  various  models  to  be 
considered  will  address  possible  departures  from  the  model 

to 

including:    the  regression  model  does  not  hold,  the  error 
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terms  do  not  have  constant  variance,  the  error  terms  are  not 
independent,  the  model  fits  all  but  one  or  a  few  outliers, 
and  the  error  terms  are  not  normally  distributed. 

After  fitting  a  model  to  the  data,  residuals  falling 
into  a  horizontal  band  centered  at  zero  displaying  no 
systematic  tendencies  to  be  positive  or  negative  and 
appearing  to  be  randomly  scattered  would  suggest  the 
assumptions  of  the  model  do  not  appear  to  be  violated.  This 
would  imply  the  model  is  well  suited  to  the  data.  If  this 
is  not  the  case,  remedial  measures  would  need  to  be  taken. 
Generally  speaking,  there  are  two  types  of  remedial  measures 
that  are  normally  followed:  abandon  the  model  altogether  or 
use  some  transformation  on  the  data  so  the  model  is  appro- 
priate for  the  transformed  data.  In  this  report,  only  two 
aspects  of  data  transformation  will  be  reckoned  with: 
autocorrelation  and  the  handling  of  outliers.  When  these 
two  problems  are  dealt  with  and  further  residual  analysis 
clearly  implies  the  assumptions  of  the  model  are  not  met, 
the  model  will  be  rejected. 

1 .   Autocorrelation 

The  regression  models  of  ordinary  least  squares  or 
maximum  likelihood  techniques  consider  the  stochastic 
disturbance  terms,  the  residuals  of  the  regression,  to  be 
either  uncorrelated  or  independent  normal  random  variables. 
In  the  application  of  regression  models  to  learning  curves, 
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we  use  time  series  data.   The  assumption  of  no  correlation 

or  independence  between  error  terms  for  time  series  data  is 

often  inappropriate.    The  observed  correlation  between 

residuals  of  regression  modeling  is  called  autocorrelation 

or  serial  correlation. 

Neter  and  Wasserman  outline  the  problems  associated 

with  autocorrelation: 

i)  The  regular  least  squares  regression  coefficients  are 
still  unbiased  but  no  longer  have  the  minimum 
variance  property  and  may  be  quite  inefficient. 

ii)  The  mean  squared  error  (MSE)  may  seriously 
underestimate  the  variance  of  the  error  terms. 

iii)  The  estimated  standard  deviation  of  the  regression 
coefficients  may  be  seriously  underestimated  and  R 
may  be  overestimated. 

iv)  The  confidence  intervals  and  tests  using  the 
student's  t  and  F  distributions  are  no  longer 
strictly  applicable.   [Ref.  20:p.  352] 

In  this  study,  the  existence  of  first  order  auto- 
correlation, AR  [1],  will  be  investigated  graphically  and 
will  be  statistically  tested  using  the  Durbin-Watson  test. 
If  autocorrelation  indeed  exists  after  examination  of  the 
residuals,  this  information  will  be  used'  to  improve  the 
regression  model.  The  autocorrelation  will  be  modeled  and 
accounted  for  in  a  transformation  of  the  model  data. 

The  first-order  autocorrelation  error  model 
discussed  by  Neter  and  Wasserman  [Ref.  20:p.  353]  for  a 
simple  linear  regression  is: 
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Yt  =  80  +  BlXt  +  Et 


€t  =  0€t-l  +  pt 


where 

p:    autocorrelation  parameter,  |p|  <  1 

2 

u. :   independent  and  distributed  N(0,  a  ) 

The  following  discussion  also  applies  in  a  nonlinear  model 
when  the  error  term  is  additive.  It  can  be  shown  that  the 
properties  of  the  error  terms  lead  to  the  following 
conclusions : 


i)     E(Gt)  =  0 


i  i)    var  (e 


t>  -.2  Z 


2s 


s  =  0 


iii)  cov(et  ■    )  =  pS  (— -— 2 

1  -  P 


)     s  ?   0 


These  imply  the  error  terms  for  the  first-order  autoregres- 
sive  model  are  autocorrel ated  unless  the  autocorrelation 
parameter,  p,  equals  zero  [Ref.  20:p.  357]. 

When  the  autocorrelation  parameter,  p,  is  not  zero, 
it  will  be  necessary  to  estimate  the  value  of  P  for  use  in 
the  autoregressi ve  structure  as  a  source  of  additional 
information  in  our  regression  model. 

Following  a  graphical  inspection  of  the  residuals, 
the  Durbin-Wat  son  test  will  be  utilized  to  test  the 
hypothesis : 
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H-:   p  =  0    implying  no  autocorrelation 


H,:   p  >  0 


The  test  statistic,  D,  used  in  this  text  is: 


i  =  2 

n 

i  =  l 


where 


.  th 


e.:   i    residual  of  the  regression  model 
n:    number  of  data  points  used  in  the  regression 
If  we  reject  the  null  hypothesis,  this  test-statistic,  D, 
can  be   used   further   to  estimate  the  autocorrelation 
coefficient,  p.    The  estimate  of  p,  r,  ,   is  discussed  by 
Neter  and  Wasserman  [Ref.  20:p.  358]  and  is: 


rl    = 


Z  ei-iei 

i=2 


(1 


2>i-i2 

i  =  2 

For  sufficiently  large  n,   an  alternative  estimator  of  p 
derived  by  Theil  and  Nagar  [Ref.  21:p.  164]  is: 
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r2  * 


!-§♦(§) 


1   "   (-) 

n 


(2) 


where  K  is  the  number  of  parameters  to  be  estimated  in  the 
regression  model. 

When  n  >>  k  then 


r3   .   l-§ 


(3) 


The   estimator   for   the   autocorrelation   parameter, 


in 


equations  (2)  and  (3)  will  be  used  in  this  study. 

The  iterative  method  of  incorporating  the  first- 
order  autoregressive  model  into  the  regression  model  is  used 
and  discussed  in  Neter  and  Wasserman  [Ref.  20:p.  361]  and 
Intriligator  [Ref.  21:p.  164].  The  data  are  first 
transformed : 


X 


i'-JO-r,')"! 


V  ■     xi  -(rj  •  xi-i) 


v  ■ 


yi  <r:  *  Yi-i) 


for  i  =  1;  j  =  1,  2,  or  3 
for  i  =  1;  j  =  1,  2,  or  3 


for  i  =  2 ,  3,  .../  n; 
j  =  1,  2,  or  3 


for  i  =  2 ,  3 ,  . . . /  n; 

j  =  1,  2,    or  3 


The  regression  is  then  performed  with  the  transformed  data. 
The  Durbin-Watson  test  is  then  employed  to  test  whether  the 
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new  residuals  for  the  transformed  data  are  uncorrelated . 
The  procedure  discussed  above  continues  until  the  Durbin- 
Watson  null  hypothesis  is  accepted. 

2.  Outliers 

The  presence  of  outliers  can  cause  some  difficulty 
when  fitting  a  model  using  the  least  squares  method. 
Outliers  can  either  be  errant  observations  or  perhaps  result 
due  to  an  interaction  with  a  variable  that  is  not  included 
in  the  model.  In  either  case,  when  outliers  exist,  those 
particular  data  points  should  be  addressed.  If  evidence 
exists  that  abnormal  circumstances  surround  a  particular 
data  point,  it  is  safe  to  discard  it.  In  order  to  address 
outliers,  it  is  obvious  that  the  analyst  must  be  familiar 
with  the  data  or  have  the  resources  to  adequately  address 
them.  In  this  report,  the  resources  to  adequately  address 
the  nature  of  outliers  does  not  exist;  therefore,  residuals 
which  lie  greater  than  +_  4  \  MSE  from  zero  will  be  designated 
as  outliers  and  rejected  but  annotated. 

3 .  Normality  of  Error  Terms 

As  discussed  by  Meter  and  Wasserman  [Ref.  20:  p. 
107],  small  departures  from  normality  do  not  create  any 
serious  problems  in  the  fitting  of  the  model.  Major 
departures,  on  the  other  hand,  should  be  of  concern.  The 
normality  assumption  will  be  graphically  addressed  through 
probability  and  symmetry  plots.    A  rough  statistical  test 
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addressing  normality  of  the  error  terms  is  discussed  in 
Neter  and  Wasserman  [Ref.  20:p.  107].  If  90  percent  of 
the  standardized  residuals,  e./  \  MSE ,  fall  between  the 
appropriate  standard  normal  values  or  the  corresponding 
student's  t-values  for  small  sample  sizes,  the  normal 
assumption  will  not  be  rejected. 
4 .   Homoscedastici ty 

The  assumption  of  constant  variance  of  the  residuals 
will  also  be  addressed  graphically  and  statistically. 
Residual  plots  will  initially  be  inspected  prior  to  con- 
ducting a  non-parametric  rank  correlation  test  between  the 
absolute  value  of  the  residual  and  the  value  of  the  indepen- 
dent variable  as  discussed  in  Conover  [Ref.  22:p.  255].  The 
assumptions  of  constant  variance  will  be  rejected  if  the 
hypothesis  of  no  correlation  is  rejected  in  this  non- 
parametric  test. 

D.   INFERENCES  CONCERNING  PARAMETER  ESTIMATION 

Following  verification  of  the  underlying  assumptions  of 
a  simple  linear  regression,  it  is  of  interest  to  investigate 
the  statistical  significance  of  the  parameter  estimates 
in  the  model : 


Yi  ■  eo  +  Vi  +  £i 


It  is  of  interest  to  initially  test  the  hypothesis 
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V    8i  ■  ° 


H1:   B,  ?    0 


to  see  if  there  is  a  statistically  significant  linear  rela- 
tionship between  the  independent  and  dependent  variables. 
It  can  be  shown  that  if  the  underlying  assumptions  of  the 
model  hold,  the  parameter  estimate  of  6,,  b,,  is  normally 
distributed  [Ref.  20:p.  53].  Therefore,  (b,  -  e,)/s(b,)  is 
distributed  as  t(n  -  2)  .  Furthermore,  the  test  to  decide 
whether  $,  is  statistically  equal  to  zero  is  based  on  the 
test  statistic: 


T1  =  b1/s(b1) 


The  decision  rule,  of  a  significance  level  a,  is  given  by 
Neter  and  Wasserman  as  [Ref.  20:p.  61]: 

Accept  HQ  if  \T1\     <    t(l  -  a/2,  n  -  2) 
Otherwise  reject  HQ 

Similarly,  it  can  be  shown  that  inferences  concerning  Bn  are 
analogous  to  those  for  8,  [Ref.  20:p.  61]. 

The  usual  tests  that  are  appropriate  in  the  linear  model 
are,  in  general,  not  appropriate  when  the  model  is  non- 
linear.  Draper  and  Smith  [Ref.  19:p.  484]  discuss  why  this 
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is  so  and  also  present  a  practical  procedure  that  can 
provide  a  measure  of  possible  lack  of  fit  for  a  non-linear 
model.  In  the  non-linear  case,  no  statistical  tests 
concerning  the  parameter  estimates  will  be  discussed  in  this 
study.  Instead,  the  results  of  the  non-linear  regression 
will  only  be  compared  to  those  of  the  simple  linear 
regression . 

E.   VALIDATION 

Since  time  series  data  is  being  used,  it  is  not  possible 
to  split  the  developmental  data  and  the  validation  data 
randomly.  For  each  learning  curve  formulation  and  the  two 
methods  of  parameter  estimation,  roughly,  the  first  seventy- 
five  percent  of  the  data  is  used  to  fit  each  regression 
model.  The  remaining  data  is  saved  to  validate  the  fore- 
casting ability  of  the  fitted  model.  While  the  validation 
phase  of  model  building  is  important,  the  criteria  of  the 
validation  phase,  that  is,  determining  how  well  a  model 
forecasts,  is  subjective  and  goodness  can  vary  depending  on 
the  needs  of  the  user.  In  this  research,  several  measures 
of  forecasting  accuracy  will  be  used  to  quantify  model 
results.  The  measures  selected  for  this  analysis  are  the 
mean  percent  error  (MPE)  ,  the  mean  absolute  percent  error 
(MAPE)  and  the  Pearson  correlation  coefficients  adjusted  for 
degrees  of  freedom.   MPE  is  defined  as: 
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MPE  = 


100 
n 


n 

E(At 
t=i 


Pt)/At 


MAPE  is  defined  as: 


MPE 


t=l 


where 

A, :   actual  or  realized  value  at  time  t 

P, :   prediction  of  forecast  value  at  time  t 

The  Pearson  correlation  coefficients  are  defined  as 


R   (fitted)  =  1  - 


Var (r)/dof 
Var (Y)/dof 


R^  (Validation)  =  1  - 


Var (rr) /dof 
Var (Y)/dof 


where 

Var(r):   sample  variance  of  the  residuals  of  the  fitted 
model 

Var(rr):  sample  variance  of  the  residuals  of  the  forecast 
values 

Var(Y):   sample  variance  of  the  developmental  dependent 
data 

Whereas  MPE  provides  a  measure  of  the  percent  bias  in  the 

forecasts,  MAPE  will  always  be  at  least  as  large  as  MPE  and 

provides  a  measure  of  dispersion  of  the  forecasts  (see  Boger 

2 
and  Jayachandran,  Ref .  23:p.  11).   Comparison  of  R  (fitted) 
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and  R  (validation)  quantitatively  evaluates  the  relative 
variability  of  the  forecasting  ability  of  the  model  beyond 
the  developmental  range. 

In  this  study,  the  level  of  the  independent  variable 
beyond  the  developmental  range  is  fixed.  The  conditional 
predictions  of  the  dependent  variable,  Y./X,,  for  the 
regression  models  for  each  learning  curve  specification  are 
based  on  the  following  relation: 

1)  Linear  Regression  Model 

a)  Autocorrelation  is  not  modeled 

a  a         a 

LN  Yfc  =  LN  8Q  +  81  LN  X 

A  A 

Yfc  =  exp(LN  Yfc) 

b)  Autocorrelation  is  modeled 

AAA  A 

Yt  =  p  Yt_1  +  (l-p)B0  +  (Xfc  -  p  Xt_1)31 

where  Y.  ,  is  equal  to  exp  {the  last  fitted  value  of 
the  developmental  data}  for  the  initial  predicted 
value . 

2)  Nonlinear  Regression  Model 

a)  Autocorrelation  is  not  modeled 

-   B 
Yt  =  60X 

b)  Autocorrelation  is  modeled 


where 

A  A 

B0'81: 


Y  =  >Yt-l  +  Vxt 


px 


t-i 


) 


where  Y._,  is  equal  to  the  last  fitted  value  of  the 
developmental  data  for  the  initial  predicted  value. 


estimated  parameters  of  the  regression 
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X,  :      independent  variable  of  the  bivariate  data  that 
was  not  used  for  developing  the  model 

p:       estimated  autocorrelation  parameter 


F.   COMPARISONS 

In  this  study,  three  learning  curve  specifications  are 
being  investigated:  the  unit  learning  curve,  the  cumulative 
average  learning  curve,  and  the  Boger  et.  al.  learning 
curve.  Each  specification  will  be  fitted  using  both  a 
simple  linear  regression  model  and  a  nonlinear  regression 
model . 

1.  Regression  Models 

The  first  comparison  that  will  be  investigated, 
which  is  of  secondary  interest  in  this  study,  will  be  the 
relative  fit  of  each  model  and  the  differences  between  the 
linear  regression  and  nonlinear  regression  methods,  with  and 
without  transformations  of  the  data  for  autocorrelation,  for 
each  learning  curve  specification.  The  approach  to  be  used 
for  these  comparisons  will  be  strictly  graphical.  For  each 
model  specification  the  dependent  variable  of  the  develop- 
mental data  will  be  plotted  against  the  observed  dependent 
variable  of  the  developmental  data  and  each  of  the  fitted 
variables  . 

2.  Learning   Curve   Specifications 

The  basis  for  comparison  between  the  unit, 
cumulative     average,     and     the     Boger     et .     al.     learning     curve 
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specifications  are  the  differences  between  actual  cost  per 
lot  and  each  model's  fitted  cost  per  lot.  Each  model's 
fitted  cost  per  lot  can  be  arrived  at  through  some 
relatively  simple  calculations  using  the  data  refinement 
procedures  discussed  above,  applied  to  the  results  of  each 
regression  technique.  The  initial  comparison  of  the  fitted 
lot  costs  will  be  done  graphically.  For  each  model 
specification  and  regression  technique,  the  observed  cost 
per  lot  and  the  fitted  cost  per  lot  will  be  plotted  against 
the  respective  lot  numbers  for  the  data  within  the 
developmental  range.  Where  the  difference  between  observed 
and  fitted  lot  costs  are  not  obviously  different  by 
graphical  means,  a  statistical  test  will  be  employed  to 
attach  statistical  significance  to  the  difference.  The  non- 
parametric  test  to  be  utilized  will  be  the  Kruskal-Wal  lis 
[Ref.  22:p.  229]  where  the  populations  are  the  different 
model  specifications  and  regression  techniques.  The  samples 
within  each  population  are  the  absolute  values  of  the 
differences  between  the  observed  and  fitted  cost  per  lot. 
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V.   RESULTS 

A.   DATA  ANALYSIS 

Two  sets  of  production  data  and  three  learning  curve 
specifications  for  each  data  set  were  investigated  in  this 
research.  A  fairly  extensive  analysis  was  performed  on  the 
residuals  of  each  type  of  regression  for  each  learning  curve 
specification  and  each  data  set.  The  results  of  each 
analysis,  generally,  led  to  further  modifications  of  the 
data  calling  for  even  more  regressions  and  residual 
analyses.  Twenty-six  regressions,  sixteen  linear  regres- 
sions and  ten  nonlinear  regressions,  were  performed  during 
the  course  of  this  study.  For  the  sake  of  brevity,  only  one 
analysis  for  a  single  learning  curve  specification  and 
production  data  set,  which  was  typical  of  the  analyses 
performed  in  all  other  cases,  will  be  discussed  at  length. 
The  results  of  the  other  regressions  and  analyses  are 
tabulated  in  Appendices  D,  E,  F,  G,  H,  and  I. 

1.   Boger  et .  al .  Model:   C-141  Data  Analysis 

The  first  18  of  the  24  total  bivariate  observations 
were  selected  to  fit  the  linear  regression  model  for  the 
Boger  et .  al .  specification  of  the  learning  curve.  The 
remaining  six  data  points  were  withheld  for  validation 
purposes.   Figure  1  is  a  scatter  plot  of  the  raw  data  and 
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Figure  1.   Raw  Data  and  Ln  Transformed  Data 
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the  natural  log  (In)  transformed  data.  The  In  transformed 
data  scatter  plot  has  seventeen  data  points  since  the  first 
observation  of  the  independent  variable  vector  was 
necessarily  omitted  since  its  value  is  infinity  when  In 
transformed . 

The  first  linear  regression  was  performed  using  the 
17  data  points  (observations  2  through  18).  Inspection  of 
the  residuals  plotted  against  time  and  against  the  fitted 
values,  Figure  2,  revealed  that  the  residuals  were  not 
patternless.  The  systematic  structure  of  the  residuals 
implied  that  the  residuals  did  not  reflect  the  assumptions 
of  the  linear  model.  The  cyclic  pattern  of  the  residuals, 
furthermore,  suggested  the  presence  of  first-order  auto- 
correlation and  encouraged  more  investigation.  The  Durbin- 
Watson  statistic  derived  from  this  set  of  residuals  led  to  a 
rejection  of  the  null  hypothesis  (Ho:  p=0)  implying  statis- 
tical significance  of  the  presence  of  first-order  auto- 
correlation. The  initial  inspection  of  the  residuals  also 
addressed  the  question  of  outliers.  Since  no  residuals  were 
outside  the  interval  specified  for  data  rejection,  no 
observations  were  omitted  from  the  data  set.  Table  IV 
highlights  the  results  of  the  initial  linear  regression. 

Since  the  sample  size  was  small  in  relation  to  the 
number  of  parameters  being  estimated,  the  Theil  and  Nagar 
estimate  for  the  first-order  autocorrelation,  r„,  was 
utilized.    The  values   in  parentheses   adjacent   to  the 
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RESIDUALS  V5  TIME:  1 7  OBSERVATIONS 
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TABLE  IV 
LINEAR  REGRESSION  1  RESULTS 


In  S 


0 


D.W. 
N 

P 

R2 

R2  adj. 


13.332  (123.21,  a  <<  .001 
-.2821  (-13.00,  a  <<  .001 
.5941 
17 

.6987 
.92 
.91 


estimated  parameters  are  the  student's  t  statistics  for  the 
respective  coefficients. 

The  autocorrelation  was  then  modeled  into  the  In 
transformed  data  resulting  in  Figure  3.  The  data  point  in 
the  upper  left  hand  corner  seems  to  be  a  typical  result  when 
autocorrelation  is  modeled  into  the  data  using  the  technique 
employed  in  this  study.  A  second  linear  regression  was 
performed  on  these  17  observations.  The  scatter  plot  of  the 
residuals  plotted  against  time  and  against  the  fitted 
values,  Figure  4,  again,  was  not  patternless  and  suggested 
the  presence  of  autocorrelation.  Due  to  the  small  sample 
size,  the  first  observation  had  a  dramatic  effect  on  the 
regression  and,  subsequently,  the  residuals.  The  Durbin- 
Watson  statistic  again  reflected  a  statistically  significant 
amount  of  autocorrelation  present  in  the  residuals.   Further 
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AUTOCORRELATION  MODELED:  17  OBSERVATIONS 
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Figure  3.   Ln  Transformed  Data  Adjusted  for 
Autocorrelation 


modeling  of  autocorrelation  into  the  data  yielded  similar 
results.  Inspection  of  the  probability  plot,  Figure  5,  a 
symmetry  plot  of  the  residuals,  the  "rough  cut"  measure  of 
normality  (94  percent  of  the  standardized  residuals  within 
the  appropriate  student's  t  value)  and  the  Hotell ing-Pabst 
statistic  (T=286,  N=17)  supporting  constant  variance  did  not 
suggest  major  departures  from  the  other  distributional 
assumptions  of  the  model.  The  results  of  the  second  linear 
regression  are  highlighted  in  Table  V. 

While  considerable  literature  exists  discussing  the 
the  need  to  retain  the  first  observation  for  further 
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RESIDUALS  VS  TIME:  17  OBSERVATIONS 
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TABLE  V 
LINEAR  REGRESSION  2  RESULTS 


in  30  : 

D.W.  : 

N  : 

p  : 

R2  : 

2 

R   ad  3.  : 


7.5234  (12.397,  a  <<  .001) 

-2.191  (-6.3097,  a  <<  .001 

.78 

17 

.6054 

.73 

.71 
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Figure    5.       Normal    Probability    Plot    of    Residuals 
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regressions  after  autocorrelation  has  been  modeled  into  the 
data,  especially  when  sample  size  is  small,  the  second 
regression  resulted  in  an  unexpected  value  for  & ,.  A  third 
regression  was  performed  after  omitting  the  first  observa- 
tion to  see  what  effects  would  be  seen  in  parameter 
estimation  and  prediction  results.  The  scatter  plot  of  the 
residuals  against  time,  Figure  6,  appear  to  be  more 
randomly  scattered  in  a  narrow  horizontal  band  about  zero. 
Furthermore,  the  probability  plot  and  histogram,  Figure  7, 
and  the  "rough  cut"  measure  of  normality  (94  percent  of 
the  standardized  residuals  within  the  appropriate  student's 
t   value)   support   the  distributional   assumptions  of   the 

RESiOUALS  V5  TIME:  16  OBSERVATIONS 


in 

i 

a 

to   o 
u 

a. 


_L 


8 

TIME 


12 


16 


Figure   6.       Residual    Plot 


64 


NORUAL  PROBABILITY  PLOT 
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Figure  7.   Normal  Probability  and  Density  Plots 
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model.    The  Durb in-Wa t son  statistic  and  the  test  for 
homoskedasticity   (Hotell ing-Pabst   statistic,   T=572,   N=16) 
suggested   the  other   assumptions   of   the   model   were   not 
violated.    The  results  of  the  third  linear  regression  are 
highlighted  in  Table  VI. 

TABLE  VI 
LINEAR  REGRESSION  3  RESULTS 


In  8 


0 


Pl 
D.W. 

N 

P 

R2 

R2  ad j . 


4.399  (35.785,  a  <<  .001) 

-.4877  (-7.146,  a  <<  .001 

2.9 

16 

-.4730 

.78 

.77 


The  nonlinear  regressions  were  performed  using  17 
bivariate  observations  (2  through  18).  The  initial 
parameter  estimates  for  B_  and  8,  were  taken  from  the 
results  of  the  first  linear  regression.  The  other  initial 
values  required  by  the  STATGRAPHICS  nonlinear  estimation 
panel  used  the  system  default  values.  The  results  of  the 
first  nonlinear  regression  are  highlighted  in  Table  VII. 
Inspection  of  the  residuals  plotted  against  time,  Figure  8, 
and  the  Durbin-Watson  statistic  led  to  acceptance  of  the 
alternative  hypothesis  (H,:   p  >  0). 
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TABLE  VII 
NONLINEAR  REGRESSION  1  RESULTS 


0 


0 


Bl 
D.W. 

N 
P 

«2 


491696.31 

-.214 

.86 

17 

.5629 

.96 
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Figure    8.       Residual    Plot 
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The  second  nonlinear  regression  was  performed  on  the 
same  17  bivariate  observations  after  autocorrelation  was 
modeled,  Figure  9.  The  results  of  this  regression  are 
highlighted  in  Table  VIII. 

TABLE  VIII 
NONLINEAR  REGRESSION  2  RESULTS 


8 


0 


1 
D.W. 

N 

P 


307094.63 
-.382 

2.44 
17 

-.2371 
.94 
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Figure  9.   Ln  Transformed  Data,  Autocorrelation  Trans- 
formation, First  Observation  Omitted 
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Inspection  of  the  residuals  plotted  against  time,  Figure  10, 
revealed  the  residuals  to  be  patternless  and  lying  in  a 
narrow  interval  around  zero.  While  the  test  for  constant 
variance  (Hotell ing-Pabst  statistic,  T-878,  N=17),  the 
Durbin-Watson  statistic  and  the  "rough  cut"  measure  of 
normality  (94  percent  of  the  standardized  residuals  within 
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the  appropriate  student's  t  value)  support  the  assumptions 
of  the  model,  the  probability  and  density  plots,  Figure  11, 
suggest  major  departures  from  the  assumption  of  normality  of 
the  error  term.    The  implications  of  the  residuals  not 
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reflecting  the  assumptions  of  the  model  will  be  discussed  in 
the  structure  analysis  portion  of  the  results  and  in  the 
conclusions . 

B.   VALIDATION 

The  validation  phase  of  this  study  consisted  of  a 
predictive  analysis  of  the  different  model  specifications 
and  the  regression  techniques  utilized.  The  initial 
investigation  of  the  predictive  ability  of  each  case 
employed  the  prediction  accuracy  measures  of  MPE ,  MAPE,  and 
the  Pearson  correlation  coefficients  adjusted  for  degrees  of 
freedom.  The  results  of  these  calculations  are  tabulated  in 
Table  IX.  The  predicted  and  fitted  results  of  each  model 
specification  and  regression  method  were  transformed  into 
the  units  of  the  original  model  specification,  i.e.,  direct 
labor  hours  for  the  Xth  unit  for  the  unit  learning  curve, 
average  cost  per  unit  for  the  cumulative  average  learning 
curve,  and  the  average  cost  in  direct  labor  hours  for  the 
units  produced  in  time  period  t  for  the  Boger  et  .  al  . 
learning  curve,  prior  to  calculating  the  prediction  accuracy 
measures.  While  the  results  for  a  model  specification  are 
comparable  over  the  various  regressions  performed,  the 
results  are  not  directly  comparable  across  model 
speci  f icat ions . 

The  negative  values  for  MPE  reflected  that  the  initial 
regression,   linear   or   nonlinear,   for   each   specification 
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TABLE  IX 
PREDICTION  ACCURACY  MEASURES —ENTIRE  HOLDOUT  SAMPLE 


nd 

n 

V 

MPE 

MAPE 

IT(fitted) 

R  (valid 

L  Boger  1 

17 

5 

-72.56 

72.56 

.291 

-1.050 

L  Boger  2 

17 

5 

59.46 

59.46 

-15.161 

-.052 

L  Boger  3 

16 

5 

61.58 

61.58 

.047 

.077 

NL  Boger  1 

17 

5 

-89.42 

89.42 

.587 

-.284 

NL  Boger  2 

17 

5 

38.20 

38.20 

.587 

-.284 

-t-> 

L  Cum  1 

174 

29 

-4.08 

4.08 

.972 

.999 

<T3 

NL  Cum  1 

174 

29 

-6.40 

6.40 

.979 

.999 

O 

i— 1 

NL  Cum  2 

174 

29 

51.03 

51.03 

.174 

-1.530 

1 

NL  Cum  3 

173 

29 

49.15 

49.15 

.018 

-1.536 

L  Unit  1 

173 

29 

1.64 

5.78 

.876 

.914 

L  Unit  2 

173 

29 

57.73 

57.73 

.562 

.912 

L  Unit  3 

172 

29 

67.43 

67.43 

.863 

.893 

NL  Unit  1 

173 

29 

-3.29 

5.81 

.892 

.915 

L  Boger  1 

17 

6 

1.50 

3.81 

.747 

.996 

L  Boger  2 

17 

6 

67.30 

67.30 

-15.95 

.923 

L  Boger  3 

16 

6 

66.29 

66.29 

.087 

.915 

-M 

NL  Boger  1 

17 

6 

-29.44 

29.44 

.868 

.996 

NL  Boger  2 

17 

6 

61.30 

61.30 

.948 

.960 

1 — 1 
1 — 1 

L  Cum  1 

9 

3 

-2.97 

2.97 

.980 

.999 

1 

C_3 

L  Cum  2 

9 

3 

46.32 

46.32 

-13.96 

.916 

L  Cum  3 

8 

3 

66.38 

66.38 

-.481 

.815 

NL  Cum  1 

9 

3 

-8.51 

8.51 

.986 

.999 

L  Unit  1 

9 

3 

12.92 

12.92 

.976 

.981 

NL  Unit  1 

9 

3 

4.43 

6.12 

.985 

.981 

where 


I 

NL 

Boger 

Unit 

Cum 


number  of  developmental  data  points 

number  of  predicted  data  points 

linear  regression  model 

nonlinear  regression  model 

Boger  et .  al  learning  curve  specification 

Unit  learning  curve  specification 

Cumulative  average  learning  curve  specification 
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overestimated  the  actual  costs.  On  the  other  hand,  after 
the  transformation  for  autocorrelation  was  performed,  the 
models  usually  underestimated  the  actual  costs.  The  most 
striking  feature  of  this  table  is  the  extremely  large  values 
of  percent  error  after  autocorrelation  was  modeled.  This 
implied  the  predicted  values  severely  underestimated  the 
actual  costs  and  could  have  been  caused  by  predicting  values 
too  far  outside  the  range  of  the  developmental  data.  When 
the  first  observation  was  omitted  following  the  adjustment 
for  autocorrelation,  the  predictions  were  slightly  more 
biased--but  not  by  a  large  amount.  Whereas  the  MPE  for  the 
Boger  et .  al .  model,  F-102  data,  implied  the  model  did  not 
predict  well  at  all;  the  MPE  for  the  Boger  et .  al  model, 
C-141  data,  reflected  excellent  predictability.  The  Boger 
et .  al  model,  F-102  data,  MPE  was  not  at  all  consistent  with 
the  MPE  values  for  the  unit  and  cumulative  average  learning 
curves  using  the  F-102  data.  Conversely,  the  Boger  et .  al . 
model,  C-141  data,  MPE  was  consistent  with  the  results  of 
the  other  specifications  using  the  C-141  data.  This  obser- 
vation could  be  due  to  unrealistic  refinements  to  the  data 
or  the  difference  in  sample  size.  After  the  transformation 
of  the  data  for  autocorrelation  was  made,  the  predicted 
values  of  the  Boger  et .  al.  model  for  both  the  C-141  and 
F-102  data  were  extremely  high  but  consistent  with  the 
results  of  the  other  specifications.    Another  result  that 
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was  the  MPE  values  for  each  of  the  nonlinear  regressions 
(with  no  adjustment  for  autocorrelation)  were  larger  than 
the  respective  linear  regressions. 

In  most  cases,  MAPE  was  the  absolute  value  of  the 
respective  MPE  value.  This  implied  that  the  models 
generally  did  not  produce  predictions  that  bracketed  the 
actual  values  but  rather  predicted  costs  that  were 
consistently  either  above  or  below  the  actual  costs. 

Prior  to  the  data  being  adjusted  for  autocorrelation, 

2  2 

the   R    (fitted)   and   R    (validate)   values   were   in   the 

interval  (.75,  .99)  except  for  the  Boger  et .  al .  model  for 
the  F-102  data.  While  the  Boger  et .  al.  linear  and  non- 
linear models,  C-141  data,  had  slightly  larger  differences 

2 

of       R         square      values       than       the      other       specifications 

(reflecting  slightly  more  variability  in  prediction  results) 
the  Boger  et .  al.  linear  and  nonlinear  models,  F-102  data, 
reflected  extremely  high  variability  of  the  fitted  and 
predicted      residuals      relative      to      the      variability      of      the 

dependent    variable    of    the    development    data--which     is    not     a 

2 
desirable     trait     of     a     model.         Negative     values     for     R       are 

indicative     of     cases     where     the     sample     variance     of     the 

residuals     are     higher     than     the     sample     variance     of      the 

developmental     dependent    variable.         In     all    cases,     when     the 

autocorrelation     transformation     was     incorporated,     the     R 

2 
squared      values      decreased      and      the     differences     between      R 

2 

(fitted)  and  R   (validate)  grew  larger. 
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The  same  prediction  accuracy  measures  were  calculated 
for  predicted  values  not  as  far  outside  the  developmental 
data  range.   These  results  are  also  tabulated  in  Table  X. 

Whereas  the  MPE  and  MAPE  values  decreased  slightly  (except 

2 
for  the  Boger  et .  al .  model,  F-102  data),  the  R   values 

remained  pretty  much  unchanged.   The  same  trends  described 

for  the  previous  table  apply  to  this   table   also.    The 

implication   of   the   results   reflected   in   this   table  of 

calculations  was  the  range  of  the  predicted  values  outside 

the  developmental  range  and  had  little  effect  on  the  initial 

prediction  accuracy  measures. 

C.   STRUCTURAL  ANALYSIS 

In  most  cases,  the  error  process  of  the  linear  and 
nonlinear  statistical  models  did  not  exhibit  the  desired 
normally  distributed,  random  structure  but,  instead, 
exhibited  a  structure  in  which  the  error  between  adjacent 
observations  were  related  to  each  other.  As  discussed 
above,  the  presence  of  autocorrelation  in  the  residuals  of  a 
model  results  in  biased  estimates  of  the  standard  errors  of 
the  regression  coefficients.  Hence,  the  standard  t-tests 
for  significance  of  the  difference  of  the  estimates  of  the 
regression  coefficients  from  zero,  and  the  coefficients  of 
determination  may  be  erroneous. 

In  all  cases  where  the  Durb i n-Wat son  test  for 
autocorrelation  resulted   in  accepting   the  alternative 
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TABLE  X 
PREDICTION  ACCURACY  MEASURES— PORTION  OF  HOLDOUT  SAMPLE 


Regression 

nd 

n 

V 

MPE 

MAPE 

R2f 

R2v 

L  Boger  1 

17 

3 

-19.23 

19.23 

.291 

.704 

L  Boger  2 

17 

3 

55.54 

55.54 

-15.161 

-.998 

L  Boger  3 

16 

3 

56.61 

56.61 

.047 

-.823 

NL  Boger  1 

17 

3 

-30.33 

30.33 

.333 

.725 

NL  Boger  2 

17 

3 

46.44 

46.44 

.587 

-.301 

+-> 

L  Cum  2 

174 

10 

-3.81 

3.81 

.972 

.999 

<T3 
O 

NL  Cum  1 

174 

10 

-6.03 

6.03 

.979 

.999 

O 

t— I 

NL  Cum  2 

174 

10 

24.90 

24.90 

.173 

-1.55 

Li- 

NL Cum  3 

173 

10 

23.98 

23.98 

.018 

-1.56 

L  Unit  2 

173 

10 

-4.74 

4.74 

.876 

.946 

L  Unit  3 

173 

10 

54.56 

54.56 

.562 

.831 

L  Unit  4 

172 

10 

64.55 

64.55 

.863 

.764 

NL  Unit  1 

173 

10 

-9.85 

9.85 

.892 

.946 

L  Boger  1 

17 

4 

1.96 

5.22 

.747 

.989 

L  Boger  2 

17 

4 

57.16 

57.16 

-15.95 

.858 

L  Boger  3 

16 

4 

56.32 

56.32 

.087 

.842 

NL  Boger  1 

17 

4 

-28.47 

28.47 

.868 

.989 

ra 

NL  Boger  2 

17 

4 

55.40 

55.40 

.949 

.910 

t— 4 

L  Cum  1 

9 

2 

-3.46 

3.46 

.980 

.999 

1— 1 

1 

L  Cum  2 

9 

2 

-39.35 

39.35 

-13.96 

.860 

(— > 

L  Cum  3 

8 

2 

57.65 

57.65 

-.481 

.682 

NL  Cum  1 

9 

2 

-2.02 

2.56 

.986 

.999 

L  Unit  1 

9 

2 

7.55 

7.55 

.976 

.999 

NL  Unit  1 

9 

2 

-1.26 

1.26 

.985 

.999 
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hypothesis  (HI:  p  >  0)  ,  this  problem  was  addressed  by 
modeling  this  phenomenon  into  the  data  and  performing 
subsequent  regressions.  In  every  case,  the  R  value  of  the 
regression  decreased  after  modeling  AR  [1]  and  then 
increased  after  the  first  observation  was  omitted. 
Similarly,  the  t-statistics  followed  the  same  trend,  and,  in 
all  cases,   the  estimated  coefficients  were  statistically 

significant.   The  statistical  significance  of  the  estimated 

2 

coefficients  and  the  R   values  (listed  in  Appendices  D,  E, 

F,  G,  H,  I)  indicated  that  there  is  indeed  a  good  amount  of 
information  contained  in,  and  a  good  deal  of  the  variation 
is  explained  by,  the  regression  model. 

After  modeling  the  autocorrelation  into  the  data  and 
performing  follow-on  regressions,  the  nature  of  the 
residuals  changed.  The  initial  regression  usually  generated 
results  that  had  a  distinct  cyclic  pattern.  The  follow-on 
regressions  reflected  a  linear  pattern  in  two  cases,  but 
always  a  non-cyclic  pattern--usually  patternless. 

In  all  cases  after  autocorrelation  was  modeled,  the 
residuals  also  appeared  to  be  and  were  statistically 
verified  to  be  homoskedas t i s t ic  .  Other  distributional 
observations  were  made.  In  the  small  sample  sizes  (N=9, 
C-141,  data  unit  and  cumulative  average  models)  ,  the 
residuals  of  the  follow-on  regressions,  both  linear  and 
nonlinear,  met  the  "rough-cut"  requirements  for  normality. 
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These  normal  assumptions  were  further  reflected  in  the 
probability  and  symmetry  plots  and  the  estimated  third  and 
fourth  moments.  In  the  mid-sized  samples  (N=17,  C-141  and 
F=102  Boger  et .  al.  data),  the  residuals  of  the  follow-on 
regressions  reflected  the  normal  assumptions  through  the 
"rough-cut"  requirements,  the  probability  and  symmetry  plots 
and  the  estimated  third  and  fourth  moments  (except  for  the 
C-141  nonlinear  regression  for  the  Boger  et.  al.  data) . 
While  the  "rough-cut"  requirements  were  met  for  the  large 
sample  sizes  (N=173,  F-102  unit  and  cumulative  average 
data)  ,  the  probability  and  symmetry  plots  and  the  estimated 
third  and  fourth  moments  suggested  that  major  departures 
from  the  assumptions  of  normality  existed.  These 
inconsistent  observations  may  be  caused  by  either  the 
differences  in  sample  sizes,  adjustments  that  were  done  to 
the  data  or  poor  models.  It  also  appeared  that  the  "rough- 
cut"  measures  of  normality  were  not  very  discriminating. 

D.   COMPARISON  OF  FITTED  MODELS 

One  of  the  secondary  aspects  of  this  research  was  to 
graphically  compare  the  fitted  models,  both  linear  and 
nonlinear,  against  the  observed  developmental  data  in  the 
units  of  the  original  models. 

The  fitted  model  results  for  the  Boger  et .  al .  model, 
linear  and  nonlinear  regressions,  C-141  data,  are  plotted  in 
Figure  12.    The  observed   independent  variable  of  the 
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Figure    12 


Fitted    Model    Results:       Boger    et .    al.    Model, 
C-141    Data 
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developmental  data  is  plotted  against  the  observed  and 
fitted  dependent  variable  values.  As  discussed  above,  the 
units  for  each  fitted  model  have  been  transformed  into  the 
units  of  the  original  model.  The  initial  linear  regression 
with  no  autocorrelation  modeled  into  the  data,  surprisingly, 
has  a  better  fit  than  its  nonlinear  counterpart.  After  the 
transformation  for  autocorrelation  was  performed,  however, 
the  linear  model  had  a  poor  fit  while  the  nonlinear 
regression  had  an  excellent  fit.  Whereas  a  third  nonlinear 
regression  was  not  performed,  the  linear  regression  with 
autocorrelation  modeled  and  dropping  the  first  observation 
had  a  poor  initial  fit  but  an  excellent  fit  for  the  latter 
part  of  the  developmental  data  range. 

The  remaining  fitted  models  are  listed  in  Appendix  J. 
Generally  speaking,  the  observations  of  each  fitted  model 
and  regression  technique  were  consistent  across  both  sets  of 
data.  Prior  to  the  adjustment  for  autocorrelation,  both  the 
linear  and  nonlinear  regressions  were  comparable  (except  in 
the  case  of  the  Boger  et .  al.  model,  F-102  data).  This  was 
a  surprising  result  since  one  would  expect  the  nonlinear 
regression  to  have  a  much  better  fit  than  the  linear 
regression  for  nonlinear  data. 

After  the  transformation  for  autocorrelation  was  made, 
the  fitted  linear  models  appeared  to  fit  poorly.  On  the 
other  hand,  the  fitted  nonlinear  models,  while  not  as  good 
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as  the  model  prior  to  the  adjustment  for  autocorrelation, 
appeared  to  have  better  fits  than  their  linear  counterparts. 
Finally,  after  the  initial  observation  was  omitted 
following  the  transformation  for  autocorrelation,  an 
interesting  observation  was  noted.  In  all  cases,  the  fitted 
model — both  linear  and  nonlinear — was  poor  for  the  initial 
portion  of  the  developmental  data  but  appeared  to  be  an 
excellent  fit  for  the  latter  portion  of  the  developmental 
data. 

E.   COMPARISON  OF  FITTED  LOT  COSTS 

The  cost  for  each  lot  derived  from  the  fitted  models  for 
each  of  the  regressions  performed  for  both  the  C-141  and  the 
F-102  data  are  plotted  against  the  observed  cost  per  lot  in 
Appendix  K.  The  fitted  lot  costs  for  the  C-141  data  are 
plotted  for  lots  two  through  eight.  Only  these  seven  lots 
are  plotted  and  used  for  comparison  since  omission  of  data 
points  in  some  regressions  and  production  data  for  a  lot 
lying  outside  the  developmental  data  range  result  in 
incomplete  fitted  lot  costs.  The  fitted  lot  costs  for  the 
C-141  are  plotted  against  the  respective  observed  lot  costs 
[see  Ref .  5,  p.  267]  in  each  plot.  The  fitted  lot  costs  for 
the  F-102  data  are  plotted  for  lots  four  through  nine  for 
the  same  reasons  cited  above.  The  observed  lot  costs  for 
the  F-102  data  are  not  the  same  for  each  plot  since  some 
outliers  were  initially  identified  and  omitted  (not  always 
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the  same  points)  prior  to  performing  the  regression. 
Inclusion  of  these  outliers  in  the  calculation  of  observed 
lot  costs,  in  some  cases,  would  bias  the  fitted  lot  costs 
down . 

Visual  inspection  of  the  fitted  lot  costs  plots, 
Appendix  K,  gives  a  good  impression  of  the  fit  of  each 
specification  of  the  learning  curve  to  the  lot  costs.  Since 
each  specification  has  been  translated  into  fitted  costs  per 
lot,  a  basis  exists  for  comparison  across  regression 
techniques  and  learning  curve  specifications.  Figure  13  is 
an  example  of  one  plot  of  the  fitted  costs  per  lot  for  the 
Boger   et .   al .   model,   nonlinear   regression,   C-141  data. 
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Figure    13.       Fitted    Lot   Costs    Results:      Eoger    et .    al.    Model, 
Nonlinear    Regression,    C-141    Data 
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Whereas  the  initial  regression  does  not  appear  to  provide  a 
good  fit  to  the  observed  data,  the  fitted  costs  per  lot 
after  autocorrelation  was  modeled  has  a  much  better  fit. 

In  general,  the  unit  learning  curve,  both  linear  and 
nonlinear  regression  techniques,  with  and  without 
transformations  for  autocorrelation,  provided  the  best 
fitted  lot  costs  for  the  F-102  production  data.  With 
respect  to  the  cumulative  average  learning  curve  specifi- 
cation, the  linear  and  nonlinear  regressions  without 
transformations  for  autocorrelation  appear  to  have  excellent 
lot  cost  fits  —  not  as  good  as  but  comparable  to  the  unit 
specification  fits.  The  fitted  lot  costs  for  the  cumulative 
average  model,  nonlinear  regression  with  the  transformation 
for  autocorrelation,  appear  to  have  reasonable  fits--but  not 
as  good  as  their  unit  specification  counterparts.  Whereas 
the  linear  regression  for  the  Boger  et .  al .  model  appears  to 
have  a  better  fit  than  its  nonlinear  counterpart  (except 
when  autocorrelation  is  modeled)  and  a  good  fit  overall,  the 
fitted  lot  costs  do  not  compare  favorably  with  the  cumula- 
tive average  and  the  unit  learning  curve  specifications.  A 
nonparametric  statistical  test  was  then  performed  comparing 
the  linear  and  nonlinear  regression  results,  no  auto- 
correlation modeled,  of  all  three  models.  The  purpose  of 
this  test  was  to  statistically  compare  the  fitted  lot  costs 
for  each  model.   As  discussed  above,  the  Kruskal-Wall is  test 
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was  performed  using  the  vectors  of  differences  between  the 
fitted  lot  costs  and  observed  lot  costs  for  the  different 
models  as  the  treatments.  The  null  hypothesis  (each  model 
tends  to  yield  identical  residual  lot  costs)  was  rejected 
with  the  Kruskal-Wallis  test  statistic  T  =  19.07,  5  degrees 
of  freedom,  .001  <  a  <  .005.  Multiple  comparisons  were  then 
performed  between  models  with  a  =  .05,  30  degrees  of 
freedom.  At  this  level,  the  Boger  et.  al .  model,  both 
linear  and  nonlinear  regression  results,  tended  to  yield 
larger  residual  lot  costs  than  both  the  unit  and  cumulative 
average  models.  The  cumulative  average  and  unit  learning 
curve  specifications  tended  to  yield  residual  lot  costs  that 
were  statistically  equal. 

With  respect  to  the  C-141  data,  the  unit  learning  curve 
specification,  linear  and  nonlinear  regressions,  appear  to 
have  excellent  fitted  lot  costs--seemingly  better  than  the 
cumulative  average  and  Boger  et  .  al  .  specifications.  The 
linear  and  nonlinear  fitted  lot  costs  of  the  cumulative 
average  and  Boger  et  .  al  .  models,  contrary  to  the  F-102 
data,  compared  favorably.  A  nonparametr ic  statistical  test 
was  then  performed  comparing  the  linear  and  nonlinear 
regression  results,  no  autocorrelation  modeled,  of  all  three 
models.  The  purpose  of  the  test  and  data  description  are 
the  same  as  above.  The  null  hypothesis  was  rejected  with 
the  Kruskal-Wallis  test  statistic  T  =  13.22,  5  degrees  of 
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freedom,  with  a  =  .05  and  36  degrees  of  freedom.  At  this 
level,  the  unit  specification,  linear  regression,  tended  to 
yield  smaller  residual  lot  costs.  All  the  other  models 
tended  to  yield  statistically  equal  residual  lot  costs. 

Generally  speaking,  the  linear  models,  with  the 
transformation  for  autocorrelation  performed,  resulted  in 
very  poor  lot  cost  fits  for  both  sets  of  data.  On  the  other 
hand,  the  nonlinear  regressions  with  autocorrelation  modeled 
resulted  in  reasonable  fits.  Similarly,  when  the  first 
observation  was  omitted  after  modeling  autocorrelation,  the 
fitted  lot  costs  were  reasonable  for  both  the  linear  and 
nonlinear  regression  techniques. 
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VI .   CONCLUSIONS 

The  primary  purpose  of  this  research  was  to  empirically 
investigate  the  validity  of  a  reformulation  of  the 
cumulative  average  learning  curve  derived  and  discussed  by 
Boger,  Jones  and  Sontheimer  in  "Budgets,  Contracts, 
Incentives  and  Costs:  A  Stylized  Nexus"  [Ref.  14:p.  23]. 
In  the  process  of  conducting  this  investigation,  the  impacts 
of  linear  versus  nonlinear  regression  methods  and  modeling 
autocorrelation  were  also  addressed. 

The  linear  and  nonlinear  Boger  et .  al  .  models  for  both 
sets  of  data,  before  autocorrelation  was  modeled,  while  not 
as  good  as  the  fitted  cumulative  average  and  unit  learning 
curve  models,  did  not  suggest  gross  inadequacies. 
Similarly,  the  fitted  cost  per  lot  for  the  Boger  et .  al  . 
model,  while  statistically  different  from  the  cumulative 
average  and  unit  specifications  for  the  F-102  data,  was  not 
statitically  different  from  the  cumulative  average  model  for 
the  C-141  data.  Again,  the  plots  of  the  fitted  costs  per 
lot  did  not  suggest  gross  inadequacies  of  the  Boger  et.  al. 
model . 

Surprisingly,  it  was  also  noted  that  the  nonlinear 
regressions  did  not  consistently  provide  much  better  fitted 
models  and  fitted  lot  costs.   Also,  in  agreement  with  other 
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literature  and  research,  the  unit  learning  curve 
specification  generally  provided  better  fitted  models  and 
fitted  lot  costs  than  both  other  models. 

The  predictive  ability  of  the  Boger  et .  al .  model  for 
the  C-141  data  was  consistent  with  the  cumulative  average 
and  unit  specification.  This  was  not  true  for  the  F-102 
data  and  is  partly  blamed  on  the  noise  in  the  data  in  the 
case  of  the  Boger  et .  al.  model. 

Whenever  autocorrelation  was  modeled  into  the  data, 
poorly  fitted  lot  costs  emerged  in  the  linear  regression 
cases.  On  the  other  hand,  when  autocorrelation  was  modeled 
during  the  nonlinear  regressions,  the  results  were  not 
substantially  degraded.  The  predictive  ability  of  all 
models  was  adversely  affected  when  the  autocorrelation  was 
modeled.  Areas  for  further  research  would  include  other 
methods  of  autocorrelation  modeling  and  the  effects  that 
other  estimates  of  p  might  have. 

While  the  structure  of  the  residuals  did   not   always 

reflect  the  assumptions  of  the  model  being  analyzed,  which 

might  lead  one  to  consider   rejecting   the  model,   Pesaran 

cautions  : 

There  is  not  theoretical  justification  for  expecting  a 
correctly  specified  model  to  possess  all  the 
characteristics  of  the  classical  regression  models.   The 
assumptions  underlying  the  classical  regression  models  are 
made,  not  because  they  are  optimal  from  the  point  of  view 
of  economic  theory,  but  because  they  are  extremely 
convenient  for  estimation  and  hypothesis  testing  purposes. 
[Ref.  24:p.  154] 
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While  observing  some  contradictory  results  between  the 
two  sets  of  aircraft  production  data,  this  researcher  feels 
that  the  results  generally  suggest  the  Boger  et.  al. 
learning  curve  specification  is  an  adequate  model.  This 
conclusion  is  tempered  by  several  observations.  It  is  felt 
that  the  C-141  and  F-102  data  used  was  a  severe  limitation 
to  the  scope  of  this  study.  While  the  sample  size  of  the 
F-102  data  was  generally  large  enough  for  the  analysis,  the 
adjustments  made  to  the  data  to  meet  the  form  required  by 
the  Boger  et .  al .  model  (discussed  in  detail  by  Womer  and 
Gulledge  [Ref.  12:p.  81])  are  rough  approximations  and  have 
introduced  considerable  noise  into  the  data.  On  the  other 
hand,  whereas  the  data  for  the  C-141  analysis  appeared  to  be 
very  smooth,  the  small  sample  size  was  a  limitation.  This 
researcher  feels  that  a  more  conclusive  analysis  could  be 
performed  with  considerably  more  effort  going  into  the  data 
gathering  stage  with  dialogue  between  the  analyst  and  the 
data  source.  Finally,  the  adjustments  made  to  the  data  for 
the  Boger  et .  al .  model  in  this  study  used  equivalent  units 
produced  per  time  period  based  on  approximate  production 
rates  to  generate  the  independent  and  dependent  variables. 
Other  proxy  variables  might  also  be  worth  investigating. 
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ADJUSTED  CUMULATIVE  AVERAGE  LEARNING  CURVE  DATA 


c 

-141  DATA 

INDEP 

DEPEN 

5 

423225.4 

11 

357139.3909 

21 

292402.7619 

36 

240278.9694 

66 

188599.7909 

94 

163846.0798 

122 

146553.9418 

150 

134506.1453 

184 

124547.1429 

217 

117384.8157 

250 

111631.5908 

284 

108303.619 

F-102  DATA 

INDEP 

DEPEN 

1 

198785.6875 

2 

191219.6804 

3 

185213.8301 

4 

173896.6535 

5 

167952.226 

6 

161332.6875 

7 

161401.8577 

8 

157705.6925 

9 

156184.7955 

10 

151562.5646 

11 

149342.8082 

12 

145756.1199 

13 

145005.0999 

14 

141416.6299 

15 

138694.999 

16 

135473.3503 

17 

134044.4885 

18 

132746.108 

19 

131026.6435 

20 

128931.8526 

21 

127743.2604 

22 

126916.129 

23 

125731.7915 

24 

126346.4326 

25 

125079.5524 

26 

124055.6411 

27 

122825.3994 

28 

121565.7707 

29 

120297.7286 

30 

119340.3851 

31 

118120.5626 

32 

117616.1813 

33 

116865.7062 

34 

115882.5315 

35 

115336.4696 

36 

114716.1899 

37 

113787.6912 

38 

113241.2048 

39 

112406.6125 

40 

111766.7878 

41 

111450.0828 

42 

110949.7169 

43 

110506.8852 

44 

109814.2177 

89 


45 

109274. 

077 

46 

108645. 

3467 

47 

107997. 

0418 

48 

107322. 

9439 

49 

107738. 

7195 

50 

107099. 

1025 

51 

106447. 

8066 

52 

105813. 

0005 

53 

105207. 

2654 

54 

104588. 

9196 

55 

103950. 

844 

56 

103558. 

5122 

57 

102967. 

6076 

58 

102511. 

1905 

59 

102119. 

26 

60 

101624. 

3641 

61 

101068. 

8712 

62 

100587. 

5227 

63 

100092. 

0185 

64 

99607. 

15497 

65 

99165. 

3994 

66 

98724. 

,14488 

67 

98249. 

,21286 

68 

97811. 

,40959 

69 

97359. 

,22169 

70 

96920. 

,15013 

71 

96502. 

,682 

72 

96487. 

,96392 

73 

96062. 

,59266 

74 

95646. 

.50853 

75 

95231. 

,91679 

76 

94936. 

,76384 

77 

94523. 

.38365 

78 

94115. 

.95844 

79 

93684. 

.01389 

80 

93288. 

.39254 

81 

92867. 

.76904 

82 

92492, 

.23391 

83 

92125. 

.37459 

84 

91759, 

.38259 

85 

91397, 

.79078 

86 

91065 

.  10999 

87 

90709, 

.44521 

88 

90592 

.83559 

89 

90250 

.72112 

90 

89921 

.49648 

91 

89609 

.36075 

92 

89300 

.22251 

93 

89275 

.92278 

94 

88962 

.94892 

95 

88910 

.61425 

96 

88606 

.39735 

97 

88336 

.35269 

98 

88055 

.24135 

99 

87726 

.11904 

100 

87417 

.2353 

101 

87114 

.22101 

102 

86810 

.91376 

103 

86504 

.70718 

104 

86208 

.22007 

105 

85924 

.00209 

106 

85643 

.41758 

107 

85361 

.04086 

108 

85082 

.73006 

109 

84818 

.42297 

110 

84558 

.93712 

111 

84306 

.83222 

112 

84056 

.04074 

113 

83793 

.9249 

114 

83521 

.40167 

115 

83276 

.40252 

116 

83039 

. 14368 

117 

82780 

.02993 

118 

82559 

.27581 

119 

82464 

.90296 

90 


120 

82229. 

78779 

121 

81982. 

89915 

122 

81755. 

64561 

123 

81543. 

97334 

124 

81323. 

88326 

125 

81120. 

26343 

126 

80903. 

43738 

127 

80692. 

32284 

128 

80468. 

89023 

129 

80249. 

76885 

130 

80233. 

93735 

131 

80213. 

40972 

132 

80221. 

90155 

133 

80015. 

78865 

134 

79808. 

75137 

135 

79618. 

02026 

136 

79400. 

12205 

137 

79222. 

23675 

138 

79019. 

00689 

139 

78817. 

75404 

140 

78631. 

05443 

141 

78454. 

55837 

142 

78265. 

,98632 

143 

78094. 

,23485 

144 

77921. 

46865 

145 

77758. 

,40858 

146 

77591. 

,65301 

147 

77425. 

,91937 

148 

77257. 

,31471 

149 

77075. 

,78451 

150 

76907. 

,31623 

151 

76742. 

,91998 

152 

76561. 

,9087 

153 

76380. 

,34463 

154 

76217. 

,64448 

155 

76045. 

,98488 

156 

75886. 

,34457 

157 

76006. 

.2876 

158 

75838, 

,75124 

159 

75668, 

.5868 

160 

75500, 

,  17295 

161 

75321, 

.  13076 

162 

75173, 

,77906 

163 

75008, 

.63919 

164 

74866, 

.05503 

165 

74729, 

.  17832 

166 

74562 

.26775 

167 

74439 

.98741 

168 

74300 

.68973 

169 

74163 

.91122 

170 

73998 

.86733 

171 

73855 

.24838 

172 

'  73721 

.23904 

173 

73573 

.65304 

174 

73414 

.80647 

175 

73276 

.25392 

176 

73130 

.78816 

177 

72981 

.46278 

178 

72852 

.35505 

179 

72720 

.96864 

180 

72573 

.00981 

181 

72442 

.27472 

182 

72306 

.35706 

183 

72162 

.12077 

184 

72037 

.29362 

185 

71911 

.28561 

186 

71773 

.09122 

187 

71642 

.90453 

188 

71544 

.09864 

189 

71442 

.18396 

190 

71347 

.40422 

191 

71252 

.98716 

192 

71145 

.75328 

193 

71072 

.64149 

194 

70978 

.74376 
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195 

70868. 

.34239 

196 

70763. 

.42668 

197 

70672. 

.20644 

198 

70576, 

.38583 

199 

70485. 

.34964 

200 

70388. 

.27422 

201 

70308, 

.3394 

202 

70221, 

.55658 

203 

70123, 

.806 

204 

70035, 

.8303 
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APPENDIX  B 


ADJUSTED  UNIT  LEARNING  CURVE  DATA 


C-141  DATA 

INDEP          DEPEN 

2. 

3803364     423225. 

.4 

8. 

24566607    302067. 

,7167 

16. 

139209      221192. 

.47 

28. 

54110847    167305. 

,66 

50. 

46156288    126584. 

.7767 

79. 

92929653    105498. 

.0464 

108. 

0780045      88501. 

.76429 

136. 

165075       82012. 

.175 

167. 

0975761      80610. 

.36765 

200. 

68436        77449, 

.41515 

233. 

7289909      73799, 

.77879 

267. 

2484024      83833, 
F-102  DATA 

.23824 

INDEP       DEPEN 

1 

198785.6875 

2 

183653.6733 

3 

173202.1294 

4 

139945. 124 

5 

144174.516 

6 

128234.9949 

7 

161816.879 

8 

131832.5362 

9 

144017.6192 

10 

109962.4866 

11 

127145.2443 

12 

106302.5488 

13 

135992.8595 

14 

94766.5199 

15 

100592. 1663 

16 

87148.6196 

17 

111182.7003 

18 

110673.6384 

19 

100076.2828 

20 

89130.8264 

21 

103971.4164 

22 

109546.369 

23 

99676.3665 

24 

140483.1777 

25 

94674.4283 

26 

98457.8582 

27 

90839. 1148 

28 

87555.7956 

29 

84792.5512 

30 

91577.4219 

31 

81525.8892 

32 

101980.3619 

33 

92850.503 

34 

83437.7656 

35 

96770.3649 

36 

93006.4016 

37 

80361.736 

38 

93021.208 

39 

80692.1038 

40 

86813.6248 

41 

98781.8842 

42 

90434.715 

43 

91907.9518 

93 


44 

80029. 

5174 

45 

85507. 

8854 

46 

80352. 

482 

47 

78175. 

0158 

48 

75640. 

3452 

49 

127695. 

946 

50 

75757. 

871 

51 

73883. 

0106 

52 

73437. 

8932 

53 

73709. 

0354 

54 

71816. 

5924 

55 

69494. 

7638 

56 

81980. 

2606 

57 

69876. 

954 

58 

76495. 

4148 

59 

79387. 

.2898 

60 

72425. 

5056 

61 

67739. 

3 

62 

71225. 

2618 

63 

69370. 

7602 

64 

69060. 

,7512 

65 

70893. 

,0432 

66 

70042. 

,6006 

67 

66903. 

,7 

68 

68478. 

59 

69 

66610. 

.4445 

70 

66624. 

,2125 

71 

67279. 

,9135 

72 

95442. 

,9798 

73 

65435. 

,862 

74 

65272. 

.367 

75 

64552. 

.1285 

76 

72800. 

.2926 

77 

63106. 

.4885 

78 

62744. 

.218 

79 

59992. 

,339 

80 

62034. 

.3055 

81 

59217. 

.889 

82 

62073, 

.8885 

83 

62042. 

.9105 

84 

61382. 

.0465 

85 

61024. 

.0785 

86 

62787, 

.243 

87 

60122, 

.2745 

88 

80447, 

.7982 

89 

60144, 

.6475 

90 

60620 

.504 

91 

61517 

.145 

92 

61168 

.6425 

93 

87040 

.3478 

94 

59856 

.38 

95 

83991 

.  1548 

96 

59705 

.7925 

97 

62412 

.065 

98 

60787 

.441 

99 

55472 

.1325 

100 

56837 

.746 

101 

56812 

.7915 

102 

56176 

.882 

103 

55271 

.636 

104 

55670 

.0475 

105 

56365 

.3315 

106 

56182 

.045 

107 

55429 

.1075 

108 

55303 

.4745 

109 

56273 

.258 

110 

56274 

.979 

111 

56575 

.2935 

112 

56218 

.186 

113 

54436 

.951 

114 

52726 

.277 

115 

55346 

.4995 

116 

55754 

.3765 

117 

52722 

.835 

118 

56731 

.044 

94 


119 

71328. 

9066 

120 

54251. 

083 

121 

52356. 

262 

122 

54257. 

967 

123 

55719. 

9565 

124 

54252. 

804 

125 

55871. 

4045 

126 

53800. 

181 

127 

54091. 

8905 

128 

52092. 

949 

129 

52202. 

2325 

130 

78191. 

673 

131 

77544. 

8184 

132 

81334. 

3314 

133 

52808. 

885 

134 

52272. 

7935 

135 

54060. 

052 

336 

49983. 

8635 

137 

55029. 

8355 

138 

51176. 

5165 

139 

51044. 

86 

140 

52679. 

81 

141 

53745. 

109 

142 

51677. 

3275 

143 

53705. 

526 

144 

53215. 

9015 

145 

54277. 

7585 

146 

53412. 

0955 

147 

53228. 

809 

148 

52472. 

4295 

149 

50209. 

3145 

150 

51805. 

542 

151 

52083. 

,4835 

152 

49229. 

205 

153 

48782. 

,6055 

154 

51324. 

,5225 

155 

49610. 

,4065 

156 

51142. 

,0965 

157 

94717. 

,4 

158 

49535. 

,543 

159 

48782. 

,6055 

160 

48722. 

,3705 

161 

46674. 

,3805 

162 

51450. 

,  1555 

163 

48255. 

,9795 

164 

51624. 

.837 

165 

52281. 

.3985 

166 

47022. 

.0225 

167 

54141, 

.4524 

168 

51037. 

.976 

169 

51185. 

.  1215 

170 

46106. 

.4505 

171 

49440, 

.0275 

172 

50805, 

.641 

173 

48188, 

.8605 

174 

45934, 

.3505 

175 

49168, 

.1095 

176 

47674 

.2815 

177 

46700 

.1955 

178 

50000 

.2874 

179 

49334 

.186 

180 

46088 

.38 

181 

48909 

.9595 

182 

47705 

.2595 

183 

45911 

.117 

184 

49193 

.9245 

185 

48725 

.8125 

186 

46207 

.129 

187 

47428 

.1785 

188 

53067 

.399 

189 

52282 

.2223 

190 

53434 

.0345 

191 

53313 

.7451 

192 

50664 

.0827 

193 

57035 

.1782 

95 


194 

52856. 

,4806 

195 

49450. 

,478 

196 

50304. 

,8623 

197 

52793. 

,0403 

198 

51699. 

,725 

199 

52460. 

.1847 

200 

51070. 

,2654 

201 

54321. 

.3748 

202 

52778. 

.2101 

203 

50378. 

.1894 

204 

52176, 

.7631 
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APPENDIX  C 

ADJUSTED  BOGER  ET  AL  LEARNING  CURVE  DATA 
F-102  DATA 
INDEP      DEPEN 


0 

33688.1 

1 

47146.3 

2. 

38077 

56338.9 

4. 

56154 

45254 

7. 

60362 

52550.1 

11. 

5382 

62859.9 

15. 

9344 

52671 

21. 

2575 

58556.1 

23. 

7728 

50268.1 

27. 

3882 

54428.4 

31. 

5228 

38313.2 

37. 

7959 

40203.7 

42. 

9844 

40016.9 

49. 

0382 

37349.3  ■ 

57. 

1344 

34468.3 

66. 

1921  • 

33106.3 

74. 

8959 

26587.1 

85. 

1844 

26044.2 

108. 

,261 

26926 

131. 

,723 

30284.2 

144. 

,223 

27512.2 

152. 

,915 

17550 

157. 

,761 

10236.9 

C-141 

DATA 

INDEP 

DEPEN 

0 

423226 

1 

423234 

3, 

.79604 

398778 

9, 

.77984 

286633 

17, 

.1062 

360400 

27. 

.5982 

313197 

45 

.3044 

271292 

71 

.6374 

212284 

106 

.329 

187897 

145 

.647 

166776 

199 

.061 

146092 

270 

.786 

122850 

379 

.058 

110953 

505 

.918 

96014.4 

655 

.962 

86009.6 

817 

.355 

81303.6 

971 

.643 

79377.5 

1140 

.14 

78052.2 

1308 

.93 

76578.5 

1444 

.26 

79039.1 

1551 

.81 

82908.3 

1629 

.37 

83169.5 

1658 

.49 

77371.7 

1663 

.95 

75771 

97 


APPENDIX  D 
BOGER  ET  AL  MODEL:   C-141  DATA  ANALYSIS  RESULTS 

RAW  DATA:  18  OBSERVATIONS 
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Figure  D-l.   Boger  et  al  Specification:   C-141  Data 
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APPENDIX  E 
UNIT  LEARNING  CURVE:   C-141  DATA  ANALYSIS  RESULTS 

RAW  DATA:  9  OBSERVATIONS 
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Figure  E-l.   Unit  Learning  Curve:   C-141  Data 
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TABLE  E-l 
LINEAR  REGRESSION  RESULTS 


In  Transformed  Data 


In  g0 

D.W. 

N 

P 

p2 

R2  adj 


13.414  (228.12) 

-.4201  (-27.27) 

1.69 

9 

.1121 

.92 

.91 


TABLE  E-2 
NONLINEAR  REGRESSION  RESULTS 


p0 

*1 

D  .  W  . 

N 
P 


P  aw  Data 
613600.37 
-.387 
1.64 
9 

.1374 
.996 
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APPENDIX  F 
CUMULATIVE  AVERAGE  LEARNING  CURVE:   C-141  DATA  ANALYSIS  RESULTS 


RAW  DATA:  9  OBSERVATIONS 
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Figure  F-l .   Cumulative  Average  Learning  Curve:   C-141  Data 
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APPENDIX  G 
BOGER  ET  AL  MODEL:   F-102  DATA  ANALYSIS  RESULTS 

RAW  DATA:  18  OBSERVATIONS 
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APPENDIX  H 
UNIT  LEARNING  CURVE:   F-102  DATA  ANALYSIS  RESULTS 


RAW  DATA:   175  OBSERVATIONS 
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APPENDIX  I 
CUMULATIVE  AVERAGE  LEARNING  CURVE:  F-102  DATA  ANALYSIS  RESULTS 

KM  OAlfc  179  oeSCRVATIOHS 
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Figure  1-1.   Cumulative  Average  Learning  Curve:   F-102  Data 
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APPENDIX  J 
FITTED  MODEL  PLOTS 
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APPENDIX  K 
FITTED  LOT  COST  PLOTS 
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